Abstract
Nowadays, Weibo has become a significant and popular information sharing platform in China. Meanwhile, spammer identification has been a big challenge for it. To mitigate the damage caused by spammers, classification algorithms from machine learning have been applied to distinguish spammers and non-spammers. However, most of the previous studies overlook the class imbalance problem of real-world data. In this paper, by analyzing the characteristics of spammers in Weibo, we select microblog content similarity, the average number of links, and the other 12 features to construct a comprehensive feature vector never seen before. Considering the existence of imbalance problems in spammer identification, an ensemble learning method is used to combine multiple base classifiers for improving the learning performance. During the training stage of base learners, fuzzy-logic-based oversampling and cost-sensitive support vector machine are considered to tackle imbalanced data at both data and algorithmic levels. The experimental results demonstrate that compared with the existing state-of-the-art methods, the recall rate of our proposed approach increases by 6.5% and reaches the precision value of 87.53% when used to deal with real-world Weibo datasets we collected.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.