YouTube and Linguistic Variation Analysis

Bridget Goodman
Nazarbayev University
Astana, Kazakhstan

The Study of Language Variation

In the field of sociolinguistics and language education, one of the key subfields is the investigation of language variation and style shifting (e.g. Jaspers, 2010). Variationists take the point of view that groups of speakers may exhibit unique phonological, lexical, or grammatical features. These patterns of mixing languages are shown to be systemic and rule-based.

When analyzed quantitatively (e.g. Sharma & Rickford, 2009), scholars can show the linguistic contexts in which certain types of variations from a mythical “standard” language are more likely to appear. Quantitative analyses can also show the social contexts (e.g. homes versus department stores) and patterns of speakers and interlocutors such as race, gender, and class (Labov, 2006; Mather, 2012) in which variations are likely to appear.

From a qualitative or social constructivist lens, such variations in language or general alternations and language varieties can be interpreted as markers of membership (or lack thereof) in a community, or as an interactive process of identity construction.

In both types of analyses, the goal is often to demonstrate that such variations are not random deviations, but are part of a principled and intentional communicative system. At the same time, scholars recognize the power differentials and constraints in employing these communicative repertoires.

Naturalistic, Artistic, and Artificial Media as Data Sources?

When I teach these concepts to my Masters students in the Multilingual Society course at Nazarbayev University Graduate School of Education, I find myself using YouTube videos to illustrate examples of variation and the concepts that underlie them. At times I use relatively academic sources such as a Vox news article that analyze the pronunciation of the letter “r” by Bernie Sanders and other New Yorkers, with commentary from a linguistics professor on historical, class, ethnic, and regional patterns of variation. More often, however, I find myself using examples from general media, e.g.:

I classify the examples above as naturalisticbecause the tokens of language emerge from naturally occurring talk. They are distinguishable from artisticmedia such as songs that mix multiple languages or varieties, in that the latter are constructed for particular communicative purposes or entertainment effect. A popular choice of artistic media among my students from Kazakhstan for quantitative analysis of language mixing is Хочу сені полюбить [I want to love you], written as a duet in a mix of Kazakh and Russian.

However, the distinction between naturalistic and artistic media is not clear cut. “Naturalistic” speech in public media by public figures can also be interpreted as a performance (Goffman, 1959; Butler, 1990) or act of identity (Le Page & Tabouret-Keller, 1985). In the interview clip mentioned above, Dave Chapelle talks about his conscious choices to use “vernacular” with his friends—including his audience—and the “job interview” style with television executives. Moreover, Chapelle in that moment is simultaneously an analyst of his own linguistic practices, and someone who provides data for further analysis and reflection by scholars. Both of these positionings can make tremendous contributions to our understanding of language variation and style shifting.

While Chapelle is voicing and commenting on his own language styles, Trevor Noah’s voicing of Russian-accented English and native Russian in his comedy special “Afraid of the Dark”—identified by my Russian-speaking students as nonsense words—is an artificialconstruction based on stereotypical understandings of features of English and Russian spoken by native Russian speakers. This is potentially more problematic, but Noah situates that portrayal in a broader conversation about recognizing that his stereotypes of Russian speakers are based on his language ideologies, i.e. associations of speakers based on what variety they are speaking:

Strangest discovery I made. The Russian accent makes me fear. The Russian language does not. Because a language is something someone else speaks. An accent is me interpreting how they’re using mine. It’s a completely different thing.

Limitations of YouTube videos for Quantitative Analysis

For quantitative analysis, the more artificial the language in a YouTube video is, the more difficult it becomes to justify its use. First, if it has been produced by a non-native speaker of a language variety for a comedic or abusive effect, its veracity as a token is in doubt. On this basis, I rejected one student’s wish to analyze a Ukrainian accent from a Russian-language TV show because it was not representative of the Ukrainian accent based on my knowledge of Ukrainian-Russian sociolinguistics. The show itself was already an adaptation of an American comedy show The Nanny, thereby an adaptation of choices about how to represent an accent. On similar grounds, I would steer a student away from an analysis of Daphne’s or her brother Simon’s Manchester accent in the 1990s sitcom Frasier, as multiple sources in media and fan groups have commented on how inaccurate those portrayals are.

Even if the purpose of the analysis is to document linguistic features stereotypically produced by non-native speakers, such analysis can reinforce rather than critique such stances in speech. In addition, variationists generally aim to elicit quantitative data from speakers who are not explicitly aware of their speech. Alternatively, researchers control the comparison of data from more careful (i.e. reading, repeated) and spontaneous speech.

YouTube Videos and Comments: The Qualitative Edge

For qualitative analysis, the range of language produced in online videos, and the range of speakers and purposes of language from diverse contexts, provide ample data for students and teachers to interpret. For example, a graduate student presented me with an artistic example, the song “Waka Waka” by Shakira and Fresh Ground. The student comprehensively interpreted the use of three languages within its social context as follows:

English has a global language status and is comprehensible for most of the population of the world. Spanish reveals the main singer’s national identity. As for the use different African languages, it emphasizes the fact that the football World Cup takes place in Africa and adds African style to the song” (Akshalova, 2018).

Moreover, she used additional online sources such as an interview with one of the performers to inform her interpretation.

Another student chose to analyze a comedy skit from the British TV show Burinstoun about an elevator’s voice recognition system and its inability to recognize a Scottish accent. While this is an artificial production of speech and interaction, the student was able to collate multiple comments posted by viewers in response to the video, e.g. “I’m living in Edinburgh and… yes, it may happen:D”, and “Haha this is so funny I’m from Scotland myself and it’s dead true tbh”

At the same time, the student was able to conclude by problematizing the phenomena inherent in the humorously portrayed situation:

The analysis of the YouTube video ‘Burnistoun- Voice Recognition Elevator in Scotland’ revealed that the stereotypes regarding the accents are not always formed by society as a whole, in some cases representatives of the same speech community perceive the other variations of their language differently; possible explanations for giving an advantage to the specific accent could be adaptation to the accent of the dialogue partner or hierarchy of the language variations. The scriptwriters and actors excellently showed the difficulties which the speakers of less ‘popular’ language variations face in reality. Ignoring the existence of that diversity may lead to the reoccurrence of such unpleasant situation (Kazymbek, 2018).

Final Reflections 

Writing this blog submission helped me think through the question, “are the diverse media on YouTube appropriate for professors and students to use as linguistic data?” The answer in nearly all cases is yes. In addition, there is emerging evidence to suggest that commentary by viewers combined with metacommentary within YouTube videos may be additional beneficial sources of information for students and teachers to interpret linguistic data, as they expand our understanding of language and citizen sociolinguistics (Rymes & Leone, 2014).

References (Offline)

Butler, J. (1990). Gender trouble: Feminism and the subversion of identity. New York: Routledge.

Goffman, I. (1959). The presentation of self in everyday life. New York: Anchor Books.

Labov, W. (2006). The social stratification of English in New York City(2ndEd.) Cambridge: Cambridge University Press.

Le Page, R. B., & Tabouret-Keller, A. (1985). Acts of Identity: Creole-Based Approaches to Language and Ethnicity. Cambridge: Cambridge University Press.

Author Bio:

Bridget Goodman is Assistant Professor and Director of the MA in Multilingual Education Program at Nazarbayev University Graduate School of Education, Astana, Kazakhstan. Her teaching and research interests include: the use of the first language (L1) in second and foreign language classrooms, language policy, and sociolinguistics in post-Soviet countries.

Linkedin:, [I coordinate the page for the SIG]
Twitter: @bridgetelf


Tags: , , , , ,