The structure of the EU mediasphere.

Diego Di Bernardo,Ilias Flaounas,Justin Lewis,Nello Cristianini,Nick Mosdell,Tijl De Bie,Marco Turchi,Nick Fyson,Omar Ali

doi:10.1371/journal.pone.0014243

Diego Di Bernardo, Ilias Flaounas + Show 7 more

Open Access

https://doi.org/10.1371/journal.pone.0014243

Copy DOI

Abstract

BackgroundA trend towards automation of scientific research has recently resulted in what has been termed “data-driven inquiry” in various disciplines, including physics and biology. The automation of many tasks has been identified as a possible future also for the humanities and the social sciences, particularly in those disciplines concerned with the analysis of text, due to the recent availability of millions of books and news articles in digital format. In the social sciences, the analysis of news media is done largely by hand and in a hypothesis-driven fashion: the scholar needs to formulate a very specific assumption about the patterns that might be in the data, and then set out to verify if they are present or not.Methodology/Principal FindingsIn this study, we report what we think is the first large scale content-analysis of cross-linguistic text in the social sciences, by using various artificial intelligence techniques. We analyse 1.3 M news articles in 22 languages detecting a clear structure in the choice of stories covered by the various outlets. This is significantly affected by objective national, geographic, economic and cultural relations among outlets and countries, e.g., outlets from countries sharing strong economic ties are more likely to cover the same stories. We also show that the deviation from average content is significantly correlated with membership to the eurozone, as well as with the year of accession to the EU.Conclusions/SignificanceWhile independently making a multitude of small editorial decisions, the leading media of the 27 EU countries, over a period of six months, shaped the contents of the EU mediasphere in a way that reflects its deep geographic, economic and cultural relations. Detecting these subtle signals in a statistically rigorous way would be out of the reach of traditional methods. This analysis demonstrates the power of the available methods for significant automation of media content analysis.

Highlights

A trend towards automation of scientific research has recently resulted into what has been termed ‘‘data-driven inquiry’’ in various disciplines, including physics and biology [1,2,3,4,5]
The automation of many tasks has been identified as a possible future for the humanities and the social sciences, in those disciplines concerned with the analysis of text, due to the recent availability of millions of books and news articles in digital format [6,7]
For six countries we found less than ten outlets with appropriate online presence resulting in a set of 255 outlets, and from those we managed to successfully parse a total of 215 for the period of study (Supporting Table S1 contains the list of news outlets)

Summary

Introduction

A trend towards automation of scientific research has recently resulted into what has been termed ‘‘data-driven inquiry’’ in various disciplines, including physics and biology [1,2,3,4,5]. The most common form of analysis is labour intensive, relying upon people to physically examine, interpret or code media content [8,9] Even those studies that employ meta-analysis struggle to find large sample sizes and consistent coding frames across the sample of different studies examined [10]. Scholars are obliged to impose their own analytical frameworks on media content: they can only find those trends or patterns that they already know or suspect are there, formulate very specific assumption about these patterns and set out to verify if they are present or not In other words, these investigations are inherently hypothesis-driven. The analysis of news media is done largely by hand and in a hypothesis-driven fashion: the scholar needs to formulate a very specific assumption about the patterns that might be in the data, and set out to verify if they are present or not

Methods

Results

Conclusion