Abstract

We present a topic modelling and data visualization methodology to examine gender-based disparities in news articles by topic. Existing research in topic modelling is largely focused on the text mining of closed corpora, i.e., those that include a fixed collection of composite texts. We showcase a methodology to discover topics via Latent Dirichlet Allocation, which can reliably produce human-interpretable topics over an open news corpus that continually grows with time. Our system generates topics, or distributions of keywords, for news articles on a monthly basis, to consistently detect key events and trends aligned with events in the real world. Findings from 2 years worth of news articles in mainstream English-language Canadian media indicate that certain topics feature either women or men more prominently and exhibit different types of language. Perhaps unsurprisingly, topics such as lifestyle, entertainment, and healthcare tend to be prominent in articles that quote more women than men. Topics such as sports, politics, and business are characteristic of articles that quote more men than women. The data shows a self-reinforcing gendered division of duties and representation in society. Quoting female sources more frequently in a caregiving role and quoting male sources more frequently in political and business roles enshrines women’s status as caregivers and men’s status as leaders and breadwinners. Our results can help journalists and policy makers better understand the unequal gender representation of those quoted in the news and facilitate news organizations’ efforts to achieve gender parity in their sources. The proposed methodology is robust, reproducible, and scalable to very large corpora, and can be used for similar studies involving unsupervised topic modelling and language analyses.

Highlights

  • Gender equality is one of the UN’s 17 Sustainable Development Goals (United Nations, 2020)

  • We analyzed a large news corpus to understand the relationship between topics in the news and the gender of those quoted

  • Our results, which consistently show that women are quoted more frequently in topics related to lifestyle, healthcare, and crimes and sexual assault, are, not unexpected

Read more

Summary

Introduction

Gender equality is one of the UN’s 17 Sustainable Development Goals (United Nations, 2020). Progress has been made, women are not represented in positions of power (OECD, 2020a, OECD, 2020b; Tremblay, 2018; UN Women, 2010); are not equal in science, including in publication metrics (Berenbaum, 2019; King et al, 2017); and do not appear in the news as often as men (Desmond and Danilewicz, 2010; Hong et al, 2020; Macharia, 2015; Shor et al, 2015; Trimble et al, 2021; Van der Pas and Aaldering, 2020). The underrepresentation of women in certain areas of the news such as politics, business, or sports is well documented (Power et al, 2019; Kemble, 2020; Thomas et al, 2020; Van der Pas and Aaldering, 2020). There is little large-scale data, about representation across entire news organizations, and even less so over a period of time

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call