An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

Mohammed Al-Sarem,Tawfik Al-Hadhrami,Zeyad Ghaleb Al-Mekhlafi,Badiea Abdulkarem Mohammed,Mohammed Hadwan,Talal Sarheed Alshammari,Abdulrahman Alreshidi,Mohammad T Alshammari,Faisal Saeed

doi:10.3390/app11209487

Abstract

The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user’s attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.

Highlights

Social networks have become the main environment for communicating, sharing, and posting news on the Internet
We proposed an effective approach for enhancing the detection process using a feature selection technique, namely a one-way ANOVA F-test
The results show that the proposed model enhances the performance of some classifiers in terms of accuracy, precision, and recall

Summary

Introduction

Social networks have become the main environment for communicating, sharing, and posting news on the Internet. For each data point in the dataset, they extracted sentence structure, clickbait language, word patterns, and n-gram features The results they achieved are as follows: SVM: an accuracy rate of 93% with 95% precision, 90%. They found that the CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results, based on accuracy, recall, precision, and F1-score performance metrics They identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. The previous studies used hybrid categorization techniques such as Gradient Boosted Decision Trees, linear regression, Naïve Bayes and random forest methods, SVM, decision tree, logistic regression, and convolutional neural network deep learning Most of these studies used datasets with headlines written in English. The results show that the proposed model enhances the performance of some classifiers in terms of accuracy, precision, and recall

Characteristics of Clickbait News

Machine Learning and Deep Learning Methods for Clickbait Detection

Problem Formulation for Clickbait Detection

To3 solve the letextracted be a dataset all posts

As afunction function that generates

Materials and Methods

Data Collection

Data Annotation

Pre-Processing

Numeric Representation

Feature Selection

Model Evaluation

Experimental Design

Results and Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Oct 13, 2021
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Investigating Innovized Progress Operators with Different Machine Learning Methods
Drishti Bhasin ... Kalyanmoy Deb
-
Drishti Bhasin, et. al.Drishti Bhasin ... Kalyanmoy Deb
01 Jan 2023
01 Jan 2023

In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening.
Jochen Sieg ... Florian Flachsenberg
Journal of Chemical Information and Modeling | VOL. 59
Jochen Sieg, et. al.Jochen Sieg ... Florian Flachsenberg
05 Mar 2019
Journal of Chemical Information and Modeling | VOL. 59

10 - Pharmacophore-based virtual screening of large compound databases can aid “big data” problems in drug discovery
Apurba K Bhattacharjee
Big Data Analytics in Chemoinformatics and Bioinformatics | VOL. -
Apurba K BhattacharjeeApurba K Bhattacharjee
01 Jan 2023
Big Data Analytics in Chemoinformatics and Bioinformatics | VOL. -

Building robust machine learning models for small chemical science data: the case of shear viscosity of fluids
Nikhil V S Avula ... Shivanand Kumar Veesam
Machine Learning: Science and Technology | VOL. 3
Nikhil V S Avula, et. al.Nikhil V S Avula ... Shivanand Kumar Veesam
01 Dec 2022
Machine Learning: Science and Technology | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences