Abstract

This study presents a novel framework based on a heterogeneous ensemble method and a hybrid dimensionality reduction technique for spam detection in micro-blogging social networks. A hybrid of Information Gain (IG) and Principal Component Analysis (PCA) (dimensionality reduction) was implemented for the selection of important features and a heterogeneous ensemble consisting of Naïve Bayes (NB), K Nearest Neighbor (KNN), Logistic Regression (LR) and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifiers based on Average of Probabilities (AOP) was used for spam detection. The proposed framework was applied on MPI_SWS and SAC’13 Tip spam datasets and the developed models were evaluated based on accuracy, precision, recall, f-measure, and area under the curve (AUC). From the experimental results, the proposed framework (that is, Ensemble + IG + PCA) outperformed other experimented methods on studied spam datasets. Specifically, the proposed method had an average accuracy value of 87.5%, an average precision score of 0.877, an average recall value of 0.845, an average F-measure value of 0.872 and an average AUC value of 0.943. Also, the proposed method had better performance than some existing methods. Consequently, this study has shown that addressing high dimensionality in spam datasets, in this case, a hybrid of IG and PCA with a heterogeneous ensemble method can produce a more effective method for detecting spam contents.

Highlights

  • An increase in penetration and access to the Internet along with developments in mobile technology in recent years has enhanced the popularity of Online Social Networks (OSNs) among Internet users

  • This study focused on proposing an effective machine-learning-based spam message detection framework by implementing machine learning techniques (KNN, Logistic Regression (LR), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), and Naïve Bayes (NB)), dimensionality reduction method, and ensemble methods (AOP technique)

  • A spam message detection framework based on a heterogeneous ensemble framework and a combination of dimensionality reduction techniques was proposed and implemented

Read more

Summary

Introduction

An increase in penetration and access to the Internet along with developments in mobile technology in recent years has enhanced the popularity of Online Social Networks (OSNs) among Internet users. OSNs such as Twitter, Facebook, Sina Weibo, Instagram and so on, has about 2.62 billion users across the globe and is expected to reach an estimated 3.02 billion by 2021 [1, 2]. Users on these networks communicate with one another by sharing and discussing both personal and public issues and events. MSN users can share short m­ essages called micro-post(s) along with images and multimedia contents with other users [6]. They connect through a process of a follower-followee relationship. As illustrated in F­ igure 1, user A initiates a friendship connection with user B without user B acknowledging in return, user A is user B’s follower and user B is followee to user A, while user B and user C are both follower and followee to each other

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call