Abstract

The proliferation of fake news across languages and domains on social media platforms poses a significant societal threat. Current automatic detection methods for low-resource languages (e.g., Swahili, Indonesian and other low-resource languages) face limitations due to two factors: sequential length restrictions in pre-trained language models (PLMs) like multilingual bidirectional encoder representation from transformers (mBERT), and the presence of noisy training data. This work proposes a novel and efficient multilingual fake news detection (MFND) approach that addresses these challenges. Our solution leverages a hybrid extractive and abstractive summarization strategy to extract only the most relevant content from news articles. This significantly reduces data length while preserving crucial information for fake news classification. The pre-processed data is then fed into mBERT for classification. Extensive evaluations on a publicly available multilingual dataset demonstrate the superiority of our approach compared to state-of-the-art (SOTA) methods. Our analysis, both quantitative and qualitative, highlights the strengths of this method, achieving new performance benchmarks and emphasizing the impact of content condensation on model accuracy and efficiency. This framework paves the way for faster, more accurate MFND, fostering more robust information ecosystems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call