ADVANCED TURKISH FAKE NEWS PREDICTION WITH BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS

Mehmet Bozuyla

doi:10.36306/konjes.995060

Abstract

The increasing usage of social media and internet generates a significant amount of information to be analyzed from various perspectives. In particular, fake news is defined as the false news that is presented as factual news. Fake news are in general fabricated toward a manipulation aim. Fake news identification is in general a natural language analysis problem and machine learning algorithms are emerged as automated predictors. Well-known machine learning algorithms such as Naïve Bayes (NB) and Random Forest (RF) are successfully used for fake-news identification problem. Turkish is a morphologically rich language and it has agglutinative complexity that requires dense language pre-processing steps and feature selection. Recent neural language models such as Bidirectional Encoder Representations from Transformers (BERT) proposes an opportunity for Turkish-like morphologically rich languages a relatively straightforward pipeline in the solution of natural language problems. In this work, we compared NB, RF, Support Vector Machine (SVM), Naïve Bayes Multinomial (NBM) and Logistics Regression (LR) on top of correlation based feature selection and newly proposed Turkish-BERT (BERTurk) to identify Turkish fake news. And we obtained 99.90 % accuracy in fake news identification which is a highly efficient model without substantial language pre-processing tasks.

Full Text