Abstract

News intentionally containing false information–known as "fake news"–is common on the Internet and often causes social disruption. In order to solve it, research on automatic detection of fake news using supervised learning has been active. Although the accuracy is improving, a major challenge for practical application remains: models can not work well for news in unknown fields (domains) due to domain biases. The goal of this study is to mitigate these domain biases and improve the accuracy of cross-domain fake news detection, which tests news from unknown domains. We firstly try to mitigate the bias by masking noun phrases which are considered a major source of domain bias. However, masking has not improved accuracy. Therefore, we point out that the dataset in this study has the property that it always contains pairs of fake and real news on the exact same topic. In this paper, we focus on this property of dataset and examine how it may affect domain bias and accuracy. Comparative experiments show that accuracy is higher when trained on a dataset with the property shown in this study. We suggest that a fake news dataset consisting of paired news could be effective for cross-domain detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call