Bias Detection in Media: An NLP-Based Approach using Corpus Statistics and Sentence Embeddings

Neeraj Gummalam,Clayton Greenberg

doi:10.47611/jsrhs.v13i2.6587

Neeraj Gummalam, Clayton Greenberg

https://doi.org/10.47611/jsrhs.v13i2.6587

Copy DOI

Export

Save

Cite

Journal: Journal of Student Research	Publication Date: May 31, 2024
License type: CC BY-NC-SA 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this paper, we implement a Natural Language Processing (NLP) solution for binary classification to categorize a sentence as biased or unbiased. Detecting bias is a challenge in the media today, but can be utilized to help readers identify which sources portray bias. The general approach to classifying a sentence as biased or unbiased involves representing words and sentences using probability or pretrained vectorization models. Our final model only contained probabilistic data about the connection between words, sentences, and each class. We used Pointwise Mutual Information (PMI) and Term Frequency Inverse Document Frequency (TF-IDF) as heuristics for finding the relationship between sentences and the biased and unbiased classes. We also leveraged Google’s Universal Sentence Encodings (USE) to capture the meaning of the sentences. Our results revealed a possible limitation in USE’s training data in terms of bias detection. Through topic analysis, we were able to uncover insights surrounding which topics are characterized by minimal bias. We were able to use these discoveries to contextualize the model’s performance.

Full Text