An Unsupervised Model to detect Web Spam based on Qualified Link Analysis and Language Models

B Lakshmipathi,Shrijina Sreenivasan

doi:10.5120/10455-5163

Abstract

With the massive use of the internet and the search engines, a major problem that comes to light is the Web Spam. Web spam can be detected by analyzing the various features of web pages and categorizing them as belonging to the spam or nonspam category. The proposed work considers unsupervised learning algorithms to characterize the web pages based on the link based features and content based features to compare the difference between the various sources of information in the source and target page. An unsupervised learning technique that is initially considered is the Hidden Markov Model which captures the different browsing patterns of users. Users may not only access the web through direct hyperlinks but may also jump from one page to another by typing URL’s or even by opening multiple windows. The unsupervised techniques have no previous class definitions to map outcomes to. As a result, they find out all possible probabilities of relation between the source and target page. This helps to attain higher efficiency in the detection of web spam even if the dataset used is small. Other unsupervised methods used to implement the same are the Self Organizing Map (SOM) and the Adaptive Resonance Theory (ART). Finally a performance evaluation of all the techniques used is made and represented in the increasing order of their performance metric.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Unsupervised Model to detect Web Spam based on Qualified Link Analysis and Language Models

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Applications

Lead the way for us

Journal: International Journal of Computer Applications	Publication Date: Feb 15, 2013
Citations: 8

Similar Papers

Performance Evaluation of User-Behaviour Techniques of Web Spam Detection Models
...
Network and Complex Systems | VOL. 10
, et. al. ...
01 Dec 2019
Network and Complex Systems | VOL. 10

Analysis of unsupervised learning techniques for face recognition
Dinesh Kumar ... Shakti Kumar
International Journal of Imaging Systems and Technology | VOL. 20
Dinesh Kumar, et. al.Dinesh Kumar ... Shakti Kumar
16 Aug 2010
International Journal of Imaging Systems and Technology | VOL. 20

Hybrid spamicity score approach to web spam detection
Siddu P Algur ... Neha Tarannum Pendari
-
Siddu P Algur, et. al.Siddu P Algur ... Neha Tarannum Pendari
01 Mar 2012
01 Mar 2012

Web spam detection based on improved tri-training
Hailong Li
-
Hailong LiHailong Li
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Unsupervised Model to detect Web Spam based on Qualified Link Analysis and Language Models

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Applications