Abstract
With the wide usage of World Wide Web (WWW) and social media platforms, fake news could become rampant among the users. They tend to create and share the news without knowing the authenticity of it. This would become the most critical issues among the societies due to the dissemination of false information. In that regard, fake news needs to be detected as early as possible to avoid negative influences on people who may rely on such information while making important decisions. The aim of this paper is to develop an automation of sentiment classifier model that could help individuals, or readers to understand the sentiment of the fake news immediately. The Cross-Industry Standard Process for Data Mining (CRISP-DM) process model has been applied for the research methodology. The dataset on fake news detection were collected from Kaggle website. The dataset was trained, tested, and validated with cross-validation and sampling methods. Then, comparison model performance using four machine learning algorithms which are Naïve Bayes, Logistic Regression, Support Vector Machine and Random Forest was constructed to investigate which algorithms has the most efficiency towards sentiment text classification performance. A comparison between 1000 and 2500 instances from the fake news dataset was analyzed using 200 and 500 tokens. The result showed that Random Forest (RF) achieved the highest accuracy compared to other machine learning algorithms.
Highlights
Fake news is viewed as one of the greatest threats to democracy, journalism, and freedom of expression
Error rates are used by supervised classification tasks to assess the consistency of data mining process
The dataset is measured by the difference in the value of fixed data and the most common tokens for determining whether the data set size may affect the machine performance
Summary
Fake news is viewed as one of the greatest threats to democracy, journalism, and freedom of expression. According to Parikh and Atrey [3], researchers around the world have been very involved in the issue of fake news detection. Their studies have been carried out on the impact of fake news and how people react to it by viewing the title of the story, and cover image of the story. These factors might convince the readers about the content in the story or in news is realistic. The fake news issues have become more popular after the Presidential election of U.S which makes many researchers trying to find out better solutions for machine learning classification [4]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have