By: Will Allen, Research Officer
Today, the Migration Observatory published a new study that quantitatively examined how UK national newspapers portrayed immigrants, migrants, asylum seekers and refugees from 2010 to 2012. Analysis over 58,000 items from 20 newspapers, totaling over 43 million words revealed many interesting patterns, including these:
- By far, immigrants were most commonly described as ‘illegal’ across tabloids mid-markets and broadsheet newspapers
- Asylum-seekers were most commonly described as ‘failed’ across these three publication types
- Words focused on conflict, nationalities, and movement—such as ‘fleeing’—tended to appear alongside mentions of ‘refugees’ compared to other groups
The full report comes with complete statistical results of the analysis, as well as an online visualisation tool that allows users to compare the results across publication types and migrant groups.
To complement the findings, I also wanted to highlight a particular characteristic of the study: comprehensiveness. As a key featureof the Observatory’s work, comprehensiveness means covering—as far as possible—all aspects of a migration issue using high-quality data sources. But in the case of analysing how the UK press describes migrant groups, what does ‘comprehensiveness’ accomplish and why does it matter? By walking through the steps taken to produce the study, I want to illustrate some of the challenges and opportunities that come with quantitatively analysing texts. Specifically, to illustrate how the team enacted this feature, I will explore three stages of the project:
- designing the scope of the study and collecting the data
- analysing the data using computer-assisted corpus linguistics methods and
- visualising the results in an interactive format.
‘Never assume, because when you ASSUME…’
There is a well-known catchphrase about the effects of assuming too much. Taken light- heartedly, it simply warns us of the danger of not communicating effectively. However, when examining news coverage, assumptions (and their companions of bias and prior expectations) can skew or bias researchers’ decisions in crucial ways. In a study of newspaper texts, these decisions can manifest themselves in the scope and direction of the project. Researchers might choose to include only certain publications they think will generate ‘interesting’ findings, or focus on a time period that is popularly thought to feature immigration stories, or only include certain parts of a newspaper that they imagine will mention UK migration such as the ‘national’ section.
Such decisions may be intentionally made in order to answer specific questions (such as “to what extent do broadsheet news stories use negative language to describe asylum seekers?” or “how did the UK press cover immigration during the 2010 General Election?”). But our research questions reflected a commitment to comprehensiveness by intending to capture, as far as possible, how all of Britain’s national newspapers—across all paper sections—covered multiple migrant groups over a relatively long time period. These decisions stemmed from a realisation that it would be inappropriate to bring assumptions about where and when immigration issues ‘should’ or ‘are more likely’ to appear. After all, the study came from a simple yet vital rationale: before making claims about how or why the press covers immigration in certain ways, it is necessary to first understand what the press has actually said.
What corpus linguistics can (and can’t) do
Implementing this rationale in the analysis stage, we faced two challenges. First, on a practical level, how could over 58,000 items be reliably collected and coded? Second, how could significant findings and ‘real’ patterns be distinguished from the rest—in other words, how to distinguish signal from noise? Overcoming the issue of prior assumptions while tempering humans’ tendency to identify patterns even when none exist (see, for instance, Daniel Kahneman’s work on heuristics, as well as practically dealing with human fatigue which could lead to errors, were paramount in developing our approach.
Corpus linguistics emerged as a method which could be helpful in this regard. Basically, a ‘corpus’ is a large collection of texts. In contemporary use, these collections are digitally stored which enables researchers to search and analyse them quickly using various software programmes. Using a large set of texts in this way gives several advantages over conventional or smaller-scale discourse analyses. On a practical level, corpus approaches remove the initial need for humans to individually read and code every item. Crucially, on a methodological level, they significantly reduce researcher bias by starting the analysis on a wide foundation of data that was not built to confirm a prior expectation. They also provide a clear framework within which researchers must base their decisions: although some subjective decisions about where to place cut-off marks for statistical tests still remain, corpus approaches demand that they be consistently and transparently applied. Put in other words, “it becomes less easy to be selective about a single newspaper article when we are looking at hundreds of articles—hopefully, overall patterns and trends should show through” (Baker 2006: 12).
Yet, corpus and other quantitative approaches to textual analysis bear certain limitations if taken to an extreme depending on the types of questions being asked of them. For instance, solely reporting a statistically significant pattern of language does not necessarily tell us much about how it is used, or the different contexts or audiences for which an article was written. In the Observatory report, we are clear that our research questions were not asking why newspapers used certain language in coverage about immigration, or whether this language was positive or negative.
Show me the goods (and let me play with them, too): Visualising the results
Comprehensiveness is also about communicating the results of such a wide-ranging study in ways that are valid and accessible. This is why the report includes an interactive visualisation of the results, as well as making the entire lists of linguistic patterns around each of the four migrant groups available. Interactivity is key in this example because it allows users to directly engage with the full results and come to their own conclusions based on the evidence.
We also acknowledge the need to show how our observed statistical results actually appear in context. Therefore, we provided real-world examples from our corpus to illustrate many of the main patterns that the statistical analysis revealed.. This is another benefit of using corpus approaches: software such as WordSmith Tools and Sketch Engine allow researchers to move among different levels in the data, from macro-level trends across publication types down to micro-level sentences, and check the quantitative results with qualitative reading. Cross-checking from quantitative to qualitative evidence is encouraged in applications of corpus linguistics.
So what? Comprehensiveness in debates about media coverage and migration in the UK is likely to remain a controversial issue. Equally, statements about the role of the press in shaping, channelling and fuelling the issue will continue to be made. Given this environment, it is vital to present and value evidence rather than assertion. Therefore, the pursuit of comprehensiveness in the realm of media analysis involves viewing the press in its entirety as much as possible, by quantifying and empirically documenting how the press has actually talked about an issue and then making that analysis available for public interpretation. Some academics, including those at the ESRC Centre for Corpus Approaches to Social Sciences are making ground-breaking progress in applying such large-scale approaches to textual analysis. The Migration Observatory also seeks to advance this value in its projects, of which today’s report is a key example.