Fake News Detection Using Naive Bayes Classifier: A Comparative Study

Abhinandan Yadav

doi:10.54060/jmss.2023.22

Abstract

Machine learning is a subfield of artificial intelligence (AI) and computer science that utilizes data and algorithms to imitate how people learn, progressively improving its accuracy. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, uncovering key insights. Detecting fake news comes under a classification problem. Fake news is false or misleading information presented as news. The initial stage in classification is dataset collection, which is followed by preprocessing, feature selection, dataset training and testing, and finally executing the classifier. There is a large amount of written text in the news. This text is processed using NLP. NLP can perform an intelligent analysis of large amounts of plain written text and generate insights from it. It involves methods like data preprocessing and feature selection. Data pre-processing involves data cleaning, removing any incorrect, duplicate, or incomplete data within a dataset. Feature selection is done using the CountVectorizer and TF-IDF Vectorizer. Then comes dataset training and testing and the use of similar data for training and testing reduces the impact of data inconsistencies. After processing the model using the training set, the model is tested by making predictions against the test set. Then, to assess the performance of the classification model for the provided set of test data confusion matrix is used. The primary purpose is to use the Naive Bayes (NB) Classifier technique to generate two classification models one using CountVectorizer and other using TF-IDF Vectorizer and compare their accuracy.

Full Text