Ensemble machine learning approaches for fake news classification

Halyna Padalko,Dmytro Chumachenko,Vasyl Chomko,Sergiy Yakovlev

doi:10.32620/reks.2023.4.01

Abstract

In today’s interconnected digital landscape, the proliferation of fake news has become a significant challenge, with far-reaching implications for individuals, institutions, and societies. The rapid spread of misleading information undermines the credibility of genuine news outlets and threatens informed decision-making, public trust, and democratic processes. Recognizing the profound relevance and urgency of addressing this issue, this research embarked on a mission to harness the power of machine learning to combat fake news menace. This study develops an ensemble machine learning model for fake news classification. The research is targeted at spreading fake news. The research subjects are machine learning methods for misinformation classification. Methods: we employed three state-of-the-art algorithms: LightGBM, XGBoost, and Balanced Random Forest (BRF). Each model was meticulously trained on a comprehensive dataset curated to encompass a diverse range of news articles, ensuring a broad representation of linguistic patterns and styles. A distinctive feature of the proposed approach is the emphasis on token importance. By leveraging specific tokens that exhibited a high degree of influence on classification outcomes, we enhanced the precision and reliability of the developed models. The empirical results were both promising and illuminating. The LightGBM model emerged as the top performer among the three, registering an impressive F1-score of 97.74% and an accuracy rate of 97.64%. Notably, all three of the proposed models consistently outperformed several existing models previously documented in academic literature. This comparative analysis underscores the efficacy and superiority of the proposed ensemble approach. In conclusion, this study contributes a robust, innovative, and scalable solution to the pressing challenge of fake news detection. By harnessing the capabilities of advanced machine learning techniques, the research findings pave the way for enhancing the integrity and veracity of information in an increasingly digitalized world, thereby safeguarding public trust and promoting informed discourse.

Full Text