A Novel Set of Contextual Features for Web Spam Detection

Faeze Asdaghi ,Ali Soleimani ,Morteza Zahedi

doi:10.22075/ijnaa.2020.4297

Abstract

Web spam is one of the significant problems facing search engines. It wastes sources and time, decreases the quality of results and leads to user discontent. The two main approaches to the detection spam web pages are link and content-based analysis. In this study, we mainly focus on content-based analysis in both user-visible text and the source code of a web page to propose a set of features for web spam detection. we explore the relationship between types and frequency of HTML (HyperText Markup Language) tags used in a web page source code. We also examine the structure of the URL as the other source of information. Finally, the content of a web page visible to the user is considered semantically in order to identify relevance among the number of the existing topics in the text as well as the coherence of a text using Latent Dirichlet Allocation. Experimental results show that the proposed features increases the index of balanced accuracy from 0.33 to 0.69 and improves the web spam detection rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Novel Set of Contextual Features for Web Spam Detection

Abstract

Talk to us

Similar Papers

More From: International Journal of Nonlinear Analysis and Applications

Lead the way for us

Similar Papers

Detecting Web Spam Based on Novel Features from Web Page Source Code
Jiayong Liu ... Shun Lv
Security and Communication Networks | VOL. 2020
Jiayong Liu, et. al.Jiayong Liu ... Shun Lv
17 Dec 2020
Security and Communication Networks | VOL. 2020

A link and Content Hybrid Approach for Arabic Web Spam Detection
Heider A Wahsheh ... Izzat M Alsmadi
International Journal of Intelligent Systems and Applications | VOL. 5
Heider A Wahsheh, et. al.Heider A Wahsheh ... Izzat M Alsmadi
01 Dec 2012
International Journal of Intelligent Systems and Applications | VOL. 5

An Unsupervised Model to detect Web Spam based on Qualified Link Analysis and Language Models
B Lakshmipathi ... Shrijina Sreenivasan
International Journal of Computer Applications | VOL. 63
B Lakshmipathi, et. al.B Lakshmipathi ... Shrijina Sreenivasan
15 Feb 2013
International Journal of Computer Applications | VOL. 63

A Perspective of Evolution After Five Years: A Large-Scale Study of Web Spam Evolution
De Wang ... Calton Pu
International Journal of Cooperative Information Systems | VOL. 23
De Wang, et. al.De Wang ... Calton Pu
01 Jun 2014
International Journal of Cooperative Information Systems | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Set of Contextual Features for Web Spam Detection

Abstract

Talk to us

Similar Papers

More From: International Journal of Nonlinear Analysis and Applications