Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods

Ferry Wahyu Wibowo,Akhmad Dahlan,Wihayati Wihayati

doi:10.1109/isriti54043.2021.9702824

Abstract

Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.

Full Text