Abstract

Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call