Data Workshop Report

With Omicron at large, the festive season approaching and rumour and misinformation continuing to hinder the vaccination of a significant minority of the UK population, our research into Covid Rumours in Historical Context has seldom felt more relevant. The team recently held an online workshop with members of four other academic research projects in this field, to discuss the merits and limitations of our different datasets and methodologies.

These projects shared the premise that the ongoing pandemic has catalysed the generation of rumours and the public’s uptake of pre-existing conspiracy theories. Recognising this, researchers have sought to better understand the genesis, mutability and transmission of these potentially harmful narratives by analysing data from social media platforms, as well as other sources. During the workshop, participants were unanimous on the vital importance of researching the spread of mis- and dis-information online, as increasing numbers of people turn to the internet to answer their health concerns, and as public health authorities use sites like Twitter to broadcast announcements and advice. While building a better understanding of this issue is an obvious prerequisite to any attempts to ameliorate the problem, investigating the beliefs of conspiracy theorists, a community already deeply suspicious of surveillance, has not been easy.

The challenges of conducting research in this field are manifold. The pandemic coincided with a rush to privatise messaging groups, thus altering the representativeness of the data that remained accessible. The nature of the evidence, sourced from social media platforms, is highly ephemeral, as seen in the case of our own project’s corpus of tweets, some 40-50% of which were removed from Twitter in the course of one year by users and content moderators. This raises questions about the reproducibility of our research. Moreover, those conspiracy theorists and rumourmongers that continue their discussion in public, are often well versed in the concept of plausible deniability and are forever tweaking their hashtags and spellings or developing more exclusionary jargon, precisely in order to disguise the true nature of their views and to obfuscate the work of bots or other automated detectors.

However, an even more fundamental problem faces researchers in this field, namely accessing the actual views of the people sharing rumours, memes and misinformation online. It is nearly impossible to know whether a subversive image or idea was shared in the spirit of ironical humour or sincere belief, and there is an equally large spectrum in how those exposed to such content may respond, from chuckling at the perceived idiocy of others, to thinking twice about vaccination, to burning down a 5G mast. Fortunately, these thorny problems of intention and reception were beyond the scope of the projects presented at our recent workshop, which instead focused on what the data can best illustrate: transmission. Despite the challenges of working in this field, the potential value of greater insight far outweighs the difficulties, and indeed, all five of the projects have seen considerable success.

Three of the projects focused solely on big data sets derived from social media platforms, while a fourth combined survey data with web-trace data, to monitor the internet activities of consenting participants. Our own project compared a large sample of tweets from the pandemic period with a variety of historical sources dating back to the 16th century. The common denominator was the employment of large data samples to discern overarching patterns, and today’s cutting-edge computational techniques have enabled researchers to analyse and visualise the data on a scale and at a depth previously unimaginable.

At least two of the workshop’s projects were running linguistic and frequency-based analyses on tens of millions of tweets, sourced from Twitter’s Application Programming Interface (API). This has offered researchers fascinating insight into the interrelated nature of conspiracy narratives, and their gradual convergence in response to the pandemic. Machine-learning techniques, such as training classifiers to organise the tweet corpus into thematic groups and sub-groups, has allowed for a more detailed investigation of linguistic patterns. Furthermore, the chronological information attached to each tweet has allowed for the mapping of word frequencies against a timeline of the main events of the pandemic. Meanwhile, the dataset itself has become a rare and valuable research resource, as de-platforming and content moderation has since changed the landscape of the twitter-sphere, rendering such samples unique. While the workshop participants thoroughly demonstrated the capability of these new visualisation and analysis tools, they also found that these techniques have their limitations.

The main limits governing big data research include ethical issues, computational capacity, and the need for human input and supervision to increase the accuracy of the AI. Questions were asked in the workshop about what kinds of research were permitted by the terms of service agreements between social media companies and their users. Others raised the issue of the limited volumes of data that could be stored and processed at a reasonable speed with each project’s available compute power. Some found the results of big data queries to be decidedly mixed, and to reflect more of the mechanical conditions put in place than the organic truths hidden in the data itself. For instance, the self-censorship of conspiracy theorists on social media, and their encoded and ever-evolving language can turn the data into a veritable sea of ambiguities, increasing the complexity of understanding the highly literal and indiscriminate results of computational pattern recognition. Nevertheless, AI’s capacity to augment and amplify the judgements and expertise of human researchers by learning from their annotation of tweets and images, does much to redeem the utility of such tools, and the stunning dashboards and visualisations of the data shared at the workshop speak for themselves.

While the computational power harnessed by each of the projects has done wonders in illuminating macroscopic trends that a single person could not have processed in a lifetime, the workshop’s participants were unanimous on the continued importance of the human researcher in organising and interpreting this data. The question of whether a computer could ever be taught to consistently identify conspiracy theories was of great interest to the participants, who for the most part thought it could not. It was remarked that even the work of Timothy Tangherlini of UC- Berkley, who has gone the furthest in this direction, by training an AI in the principles of narratology, has yet to produce an understanding of conspiracy theories on par with that of a human researcher. Humans, it seems, are still integral to this research work, from the input level of categorising thousands of tweets to train the classifiers, to the output level, when an advanced synthesis of experience, meaningful comprehension and judgement is required to interpret and share the results. In summary, a dictum of Garry Kasparov’s comes to mind: “A good human plus a machine is the best combination.”

In further exploring what makes for ‘a good human’ in the context of rumour and conspiracy research, we might consider the unusually multidisciplinary group of people present at the workshop, with backgrounds in computer science, social science and the humanities. Several of the projects had allied the distant-reading capabilities of data-science with the close-reading techniques of humanities scholars to great effect.

This method is exemplified in our own project’s mission to place Covid rumours in their historical context. We believe that in order to make truly compelling and generalisable conclusions about rumours and conspiracies it is necessary to consider their longer-term historical manifestations. Indeed, the past seems to matter a great deal to all the players in this field, from conspiracy theorists linking present crises with longer-term narratives, to policymakers needing to distinguish the perennial from the contingent, to social media companies looking to downplay their role in exacerbating the problem by pointing to historical parallels. Even when working with the contemporary data the historical mindset comes into its own, for historians are already adept at piecing together fragmentary sources, accounting for bias, and restoring evidence to its full context.

In closing, this workshop did more than consider the ins and outs of each project’s data and methodologies; it afforded the participants a tantalising glimpse of what is possible when diverse approaches are rigorously applied to one of the most serious problems facing society today. It is in large part thanks to the forward-thinking funding of organisations like the AHRC, that cutting-edge technologies and the critical and interpretive insight of the humanities can combine for the benefit of our embattled public sphere.

The Team- 03/12/2021

Recent Posts

Recent Comments

Archives

Categories

Meta