Topic Modeling for Media and Communication Research: A Short Primer

Cornelius Puschmann,Tatjana Scheffler

doi:10.2139/ssrn.2836478

Abstract

A variety of powerful tools for the automated and semi-automated analysis of textual data are increasingly at the disposal of media and communication researchers. Among the assemblage of methods, the school of techniques known as topic modeling has recently attracted particular interest. What utility does one popular type of topic model, latent dirichlet allocation (LDA), have for media and communication research? This paper illustrates some distinct strengths and weaknesses of LDA. We first briefly introduce its conceptual foundations, along with a selection of studies from the social sciences that apply it to different types of content, from newspapers and scientific publications to literary texts and social media. We then present a case study of news coverage of the Syrian civil war. After describing our data, we turn to two facets of the results in particular: the relation of terms and topics and the proportions of topics in documents, aggregated into months. We make the case for contrastive (rather than descriptive) uses of topic modeling that build broader analyses on the initial output of a model, rather than concluding with a list of terms.

Full Text