A Spammer Identification Method for Class Imbalanced Weibo Datasets

Wenbing Tang,Zuohua Ding,Mengchu Zhou

doi:10.1109/access.2019.2901756

Wenbing Tang, Zuohua Ding + Show 1 more

Open Access

https://doi.org/10.1109/access.2019.2901756

Copy DOI

Abstract

Nowadays, Weibo has become a significant and popular information sharing platform in China. Meanwhile, spammer identification has been a big challenge for it. To mitigate the damage caused by spammers, classification algorithms from machine learning have been applied to distinguish spammers and non-spammers. However, most of the previous studies overlook the class imbalance problem of real-world data. In this paper, by analyzing the characteristics of spammers in Weibo, we select microblog content similarity, the average number of links, and the other 12 features to construct a comprehensive feature vector never seen before. Considering the existence of imbalance problems in spammer identification, an ensemble learning method is used to combine multiple base classifiers for improving the learning performance. During the training stage of base learners, fuzzy-logic-based oversampling and cost-sensitive support vector machine are considered to tackle imbalanced data at both data and algorithmic levels. The experimental results demonstrate that compared with the existing state-of-the-art methods, the recall rate of our proposed approach increases by 6.5% and reaches the precision value of 87.53% when used to deal with real-world Weibo datasets we collected.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 50	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A Spammer Identification Method for Class Imbalanced Weibo Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Health state-related ensemble deep learning method for aircraft engine remaining useful life prediction
Yujie Cheng ... Dengwei Song
Applied Soft Computing | VOL. 135
Yujie Cheng, et. al.Yujie Cheng ... Dengwei Song
20 Jan 2023
Applied Soft Computing | VOL. 135

Sentiment analysis of coronavirus data with ensemble and machine learning methods
Muhammet Sinan Başarslan ... Fatih Kayaalp
Turkish Journal of Engineering | VOL. 8
Muhammet Sinan Başarslan, et. al.Muhammet Sinan Başarslan ... Fatih Kayaalp
30 Apr 2024
Turkish Journal of Engineering | VOL. 8

Cost Sensitive SVM with Non-informative Examples Elimination for Imbalanced Postoperative Risk Management Problem
Maciej Zięba ... Jerzy Świątek
-
Maciej Zięba, et. al.Maciej Zięba ... Jerzy Świątek
01 Jan 2014
01 Jan 2014

Sentiment analysis with ensemble and machine learning methods in multi-domain datasets
Muhammet Sinan Başarslan ... Fatih Kayaalp
Turkish Journal of Engineering | VOL. 7
Muhammet Sinan Başarslan, et. al.Muhammet Sinan Başarslan ... Fatih Kayaalp
15 Apr 2023
Turkish Journal of Engineering | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Spammer Identification Method for Class Imbalanced Weibo Datasets

Abstract

Talk to us

Similar Papers

More From: IEEE Access