Detecting spam web pages using content and link-based techniques

Rajendra Kumar Roul,Dhruvesh Parikh,Mit Shah,Shubham Rohan Asthana

doi:10.1007/s12046-015-0460-9

Abstract

Web spam is a technique through which the irrelevant pages get higher rank than relevant pages in the search engine’s results. Spam pages are generally insufficient and inappropriate results for user. Many researchers are working in this area to detect the spam pages. However, there is no universal efficient technique developed so far which can detect all spam pages. This paper is an effort in that direction, where we propose a combined approach of content and link-based techniques to identify the spam pages. The content-based approach uses term density and Part of Speech (POS) ratio test and in the link-based approach, we explore the collaborative detection using personalized page ranking to classify the Web page as spam or non-spam. For experimental purpose, WEBSPAM-UK2006 dataset has been used. The results have been compared with some of the existing approaches. A good and promising F-measure of 75.2% demonstrates the applicability and efficiency of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Detecting spam web pages using content and link-based techniques

Abstract

Talk to us

Similar Papers

More From: Sadhana

Lead the way for us

Similar Papers

Google Penguin: Evasion in Non-English Languages and a New Classifier
Abdulrahman Alarifi ... Ahmad Alkhaledi
-
Abdulrahman Alarifi, et. al.Abdulrahman Alarifi ... Ahmad Alkhaledi
01 Dec 2013
01 Dec 2013

A new enhanced technique for link farm detection
D Saraswathi ... R Kavitha
-
D Saraswathi, et. al.D Saraswathi ... R Kavitha
01 Mar 2012
01 Mar 2012

An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined Approach
Asim Shahzad ... Muhammad Zubair Rehman
Complexity | VOL. 2021
Asim Shahzad, et. al.Asim Shahzad ... Muhammad Zubair Rehman
15 Nov 2021
Complexity | VOL. 2021

Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers.
Mansour Alsaleh ... Abdulrahman Alarifi
PloS one | VOL. 11
Mansour Alsaleh, et. al.Mansour Alsaleh ... Abdulrahman Alarifi
17 Nov 2016
PloS one | VOL. 11

Journal: Sadhana	Publication Date: Feb 1, 2016
Citations: 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting spam web pages using content and link-based techniques

Abstract

Talk to us

Similar Papers

More From: Sadhana