Abstract

Web spam is a negative practice carried out by spammers to produce fake searchengines results for improving rank position of their Web pages. It is available on arena of World Wide Web (WWW) in different forms and lacks a consistent definition. The search engines are struggling to eliminate spam pages through machine learning (ML) detectors. Mostly, search engines measure the quality of websites by using different factors (signals) such as, number of visitors, body text, anchor text, back link and forward link etc. information and, and spammers try to induce these signals into their desired pages to subvert ranking function of search engines. This study compares the detection efficiencyof different ML classifiers trained and tested on WebSpam UK2007 data set. The results of our study show that random forest has achieve higher score than other well-known classifiers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call