Abstract

The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user’s attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.

Highlights

  • Social networks have become the main environment for communicating, sharing, and posting news on the Internet

  • We proposed an effective approach for enhancing the detection process using a feature selection technique, namely a one-way ANOVA F-test

  • The results show that the proposed model enhances the performance of some classifiers in terms of accuracy, precision, and recall

Read more

Summary

Introduction

Social networks have become the main environment for communicating, sharing, and posting news on the Internet. For each data point in the dataset, they extracted sentence structure, clickbait language, word patterns, and n-gram features The results they achieved are as follows: SVM: an accuracy rate of 93% with 95% precision, 90%. They found that the CNN-LSTM model when implemented with pre-trained GloVe embedding yields the best results, based on accuracy, recall, precision, and F1-score performance metrics They identify eight other types of clickbait headlines: reaction, reasoning, revealing, number, hypothesis/guess, questionable, forward referencing, and shocking/unbelievable. The previous studies used hybrid categorization techniques such as Gradient Boosted Decision Trees, linear regression, Naïve Bayes and random forest methods, SVM, decision tree, logistic regression, and convolutional neural network deep learning Most of these studies used datasets with headlines written in English. The results show that the proposed model enhances the performance of some classifiers in terms of accuracy, precision, and recall

Characteristics of Clickbait News
Machine Learning and Deep Learning Methods for Clickbait Detection
Problem Formulation for Clickbait Detection
To3 solve the letextracted be a dataset all posts
As afunction function that generates
Materials and Methods
Data Collection
Data Annotation
Pre-Processing
Numeric Representation
Feature Selection
Model Evaluation
Experimental Design
Results and Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.