Abstract
In this paper, we implement a Natural Language Processing (NLP) solution for binary classification to categorize a sentence as biased or unbiased. Detecting bias is a challenge in the media today, but can be utilized to help readers identify which sources portray bias. The general approach to classifying a sentence as biased or unbiased involves representing words and sentences using probability or pretrained vectorization models. Our final model only contained probabilistic data about the connection between words, sentences, and each class. We used Pointwise Mutual Information (PMI) and Term Frequency Inverse Document Frequency (TF-IDF) as heuristics for finding the relationship between sentences and the biased and unbiased classes. We also leveraged Google’s Universal Sentence Encodings (USE) to capture the meaning of the sentences. Our results revealed a possible limitation in USE’s training data in terms of bias detection. Through topic analysis, we were able to uncover insights surrounding which topics are characterized by minimal bias. We were able to use these discoveries to contextualize the model’s performance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have