Improving blog spam filters via machine learning

Weiwen Yang,Linchi Kwok

doi:10.1504/ijdats.2017.10006962

Abstract

As an important platform of electronic commerce, blogs can greatly influence internet users' purchasing decisions. Spam, however, can substantially reduce blogs' positive impact on electronic commerce. This paper introduces SK, an alternative algorithm combining supervised learning (SVM) and unsupervised learning (K-means++) to detect blog spam. If either classifies a blog as spam, then the blog is assigned to the spam category. Feature selection includes term frequency, inverse document frequency, binary representation, stop words, outgoing links, advertiser content, and burst with keywords. Accuracy of each model was tested and compared in experiments with 3,000 blog pages from University of Maryland and 3,560 internet blogs. Findings suggest that combining the SVM algorithm and K-means++ clustering can increase accuracy of filtering spams by about 7% as compared to using just one of these methods. Strengths and weaknesses of various spam-filtering methods were discussed, providing considerations for businesses when choosing a spam filter.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving blog spam filters via machine learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Analysis Techniques and Strategies

Lead the way for us

Journal: International Journal of Data Analysis Techniques and Strategies	Publication Date: Jan 1, 2017
Citations: 1

Similar Papers

Improving blog spam filters via machine learning
Weiwen Yang ... Linchi Kwok
International Journal of Data Analysis Techniques and Strategies | VOL. 9
Weiwen Yang, et. al.Weiwen Yang ... Linchi Kwok
01 Jan 2017
International Journal of Data Analysis Techniques and Strategies | VOL. 9

A Primer on Machine Learning.
Audrene S Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

About Our Authors
-
Information Systems Research | VOL. 24
--
01 Jun 2013
Information Systems Research | VOL. 24

A Generalized Method for Sentiment Analysis across Different Sources
Abubakar M Ashir
Applied Computational Intelligence and Soft Computing | VOL. 2021
Abubakar M AshirAbubakar M Ashir
18 Dec 2021
Applied Computational Intelligence and Soft Computing | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving blog spam filters via machine learning

Abstract

Talk to us

Similar Papers

More From: International Journal of Data Analysis Techniques and Strategies