Corpus-assisted discourse studies: A useful way of investigating language attitudes in social media data?

Ethan Kutlu and Ruth Kircher

Many people use social media to find out about the latest news, to stay connected with one another, and/or to post hilarious cat videos. Social media sites have become important platforms for people to share and discuss their world views. Often, such discussions turn into heated debates – and it is not uncommon for the focus of attention to be on language. Remember the arguments about Laurel vs. Yanny?! Frequently, these “wars of words” don’t just deal with individual words but with entire language varieties, as evidenced for example by the tweets posted and shared by @AccentismProj. Thus, people clearly use social media to express their attitudes towards different language varieties and their speakers. However, to date, there is little research into whether and to what extent expressions of language attitudes in online spaces differ from those in offline spaces – and specifically, about whether the former mirror the latter or whether the (relative) anonymity afforded in online spaces affects the expressions of language attitudes.

Could corpus-assisted discourse studies (CADSs) be a useful way of shedding light on the expression of language attitudes in social media data? The underlying idea of CADSs is to combine corpus linguistics and discourse analysis – that is, finding patterns in large collections of texts as well analysing specific text fragments in more detail (e.g. Baker 2006). The main focus of CADSs is on the following:

  • the determination of frequencies – i.e. highly frequent words and phrases, because ‘words that are repeated […] are understood to have a particular function within the society producing the texts’ (Vessey 2016: 5);
  • the analysis of collocations – i.e. the investigation of the words with which pertinent terms tend to co-occur, because ‘isolated words are not understood to be meaningful on their own’ (Vessey 2016: 5); and
  • the analysis of concordance lines – i.e. lines which ‘present an individual lexical item within its co-text across numerous texts’ (Vessey 2016: 5) – as well as larger discourse segments.

CADSs have been used very effectively to investigate ideologies (e.g., Orpin 2005; Vessey 2015; Kircher & Fox 2019) – but to date, there is almost no CADS research regarding language attitudes (with the only exception we know of being Jaworska & Themistocleous 2018).

In our latest study (Kutlu & Kircher 2021), we aimed to find out if CADSs constitute a useful way of investigating language attitudes in social media data by employing this method to examine attitudes towards Spanish as a heritage language in Florida. We focused on Twitter data because Twitter is a public domain website, it allows for data collection over a specific period of time and for a specific location (as well as archival data, depending on the researchers’ needs), and the tools needed for the data collection process are open access. We used the R package rtweet to download tweets from Florida over a 12-week period, using keywords such as “Spanish” and “español.” Based on the downloaded tweets, we created an English corpus (183,278 tweets amounting to 5,405,947 words) and a Spanish corpus (20,959 tweets amounting to 525,425 words). We investigated frequencies, collocations, concordance lines, and larger text segments from our corpora.

So what did our analysis reveal?  Here are the key findings:

  1. We found evidence of the same evaluative dimensions in our online data that are commonly found in offline data – namely, status and solidarity (e.g. Giles & Watson 2013). Specifically, there was evidence of primarily negative attitudes towards Spanish on the status dimension and primarily positive attitudes on the solidarity dimension. However, despite the latter, transmission and use of Spanish seemed to be affected by pressure to assimilate and fear of negative societal repercussions.
  2. We also found notable differences between our two corpora: Spanish was used less frequently than English to tweet about attitudes. Instead, Spanish was often used to attract Twitter users’ attention to links leading to particular websites, shows, and sports events that Spanish speakers could watch in Spanish.

Overall, the findings based on our online data certainly appear to reflect the offline situation in Florida – including the commonly-occurring language-based discrimination of the state’s Spanish-speaking population (e.g., The Associated Press 2019) and the comparatively low vitality of Spanish in Florida (e.g., U.S. Census Bureau 2019). A more detailed discussion of our findings can be found in our recent open-access article (Kutlu & Kircher 2021). Of course, we cannot make any generalisations based on one exploratory study, and further research is necessary to investigate the links between attitudes expressed in offline and online contexts. Nonetheless, our findings offer meaningful insights into attitudes toward Spanish as a heritage language in Florida – and they suggest that CADSs do constitute a useful way of investigating language attitudes in social media data.

What about you? Have you ever encountered negative attitudes towards your own way of speaking on social media? Or have you witnessed friends or online acquaintances experiencing this? If so: How do you usually react in such situations – and do you worry that language attitudes expressed on social media might reinforce linguistic discrimination in offline spaces? We are curious to hear about your views and strategies. Tell us in the comments section below?!


Baker, P. 2006. Using Corpora in Discourse Analysis. London: Continuum.

Giles, H., and Watson, B. M. (2013). The Social Meanings of Language, Dialect and Accent: International Perspectives on Speech Styles. New York: Peter Lang.

Jaworska, S., and Themistocleous, C. (2018). Public discourses on multilingualism in the UK: Triangulating a corpus study with a sociolinguistic attitude survey. Language in Society, 47 (1): 57-88:

Kircher, R. and Fox, S. (2019). Multicultural London English and its speakers: A corpus-informed discourse study of standard language ideology and social stereotypes. Journal of Multilingual and Multicultural Development, online ahead of print:

Kutlu, E., and Kircher, R. (2021). A corpus-assisted discourse study of attitudes towards Spanish as a heritage language in Florida. Languages 6 (1): 1-18:

Orpin, D. (2005). Corpus linguistics and critical discourse analysis: Examining the ideology of sleaze. International Journal of Corpus Linguistics 10 (1): 37-61:

The Associated Press. (2019). Florida Nurses: Clinic Warns Only Speak English or Be Fired. Latino Rebels. Available online:

U.S. Census Bureau. (2019). Hispanic or Latino Origin by Specific Origin, 2019 American Community Survey 1-Year Estimates. Available online:

Vessey, R. (2015). Corpus approaches to language ideology. Applied Linguistics 38: 277-96:

Vessey, R. (2016). Language ideologies in social media: The case of Pastagate. Journal of Language and Politics 15: 1-24:

Ethan Kutlu is a postdoctoral researcher in the University of Florida’s Psychology department. His work focuses on multilingual experiences and how these experiences shape language and cognition. Ethan also investigates these experiences from social aspects, where he focuses on attitudes towards multilingualism and multilinguals’ own attitudes towards their language experience. You can access Ethan’s website here, or find him on Twitter: @ethankutlu

Ruth Kircher is a researcher at the Mercator European Research Centre on Multilingualism and Language Learning, which is part of the Fryske Akademy in Leeuwarden (Netherlands). As a sociolinguist with a specialisation in societal multilingualism and language contact situations, her work focuses especially on language attitudes, ideologies, and practices as well as language policy and planning. Ruth has a particularly strong interest in these issues with regard to autochthonous and migrant minorities. This is Ruth’s website and you can also find her on Twitter: @ruth_kircher


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s