Web Spam Research Articles

In today’s era, Sustainable city is evaluated using the services provided to the society. The designing of the integral part of the society should be focused towards the benefits of people. Internet is extensively utilized by the society using Search Engines. The accuracy and time it takes for different search engines to retrieve information from a cloud computing repository around the world, varies. However, it has been discovered in the literature that webpage ranking reduces the amount of time a user spends surfing, which saves a significant amount of energy during computation and transmission across the network. The hyperlink structure of the web graph is used in most of the earlier solutions documented in the literature, which consumes a lot of energy during calculation. It may exacerbate the link leakage problem by increasing the frequency of spam pages. In light of the energy consumption of various smart gadgets, hyperlink structure alone is no longer sufficient for predicting webpage relevance. Its true importance is revealed by user surfing activity. To improve search engine accuracy and speed, it is critical to demote spam pages, lowering energy consumption. Among all the existing ranking algorithms in the literature, one of the important components is the PageRank algorithm used by Google’s ranking module. Keeping focus on these points, in this paper, various page ranking algorithms based upon supervised learning are surveyed and summarized with respect to different selected parameters and experiments performed. Using this information, a detailed taxonomy of search engine results is presented in the text. Moreover, PageRank algorithms are explored by using different supervised learning techniques applied in the existing proposals for getting and processing the results. In the nutshell, the PageRank methodology is surveyed with respect to web spam detection which is the demand of cognitive systems in smart cities.

Read full abstract

In this modern era, people utilise the web to share information and to deliver services and products. The information seekers use different search engines (SEs) such as Google, Bing, and Yahoo as tools to search for products, services, and information. However, web spamming is one of the most significant issues encountered by SEs because it dramatically affects the quality of SE results. Web spamming’s economic impact is enormous because web spammers index massive free advertising data on SEs to increase the volume of web traffic on a targeted website. Spammers trick an SE into ranking irrelevant web pages higher than relevant web pages in the search engine results pages (SERPs) using different web-spamming techniques. Consequently, these high-ranked unrelated web pages contain insufficient or inappropriate information for the user. To detect the spam web pages, several researchers from industry and academia are working. No efficient technique that is capable of catching all spam web pages on the World Wide Web (WWW) has been presented yet. This research is an attempt to propose an improved framework for content- and link-based web-spam identification. The framework uses stopwords, keywords’ frequency, part of speech (POS) ratio, spam keywords database, and copied-content algorithms for content-based web-spam detection. For link-based web-spam detection, we initially exposed the relationship network behind the link-based web spamming and then used the paid-link database, neighbour pages, spam signals, and link-farm algorithms. Finally, we combined all the content- and link-based spam identification algorithms to identify both types of spam. To conduct experiments and to obtain threshold values, WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets were used. A promising F-measure of 79.6% with 81.2% precision shows the applicability and effectiveness of the proposed approach.

Read full abstract

Web Spam Research Articles

Related Topics

Articles published on Web Spam

PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models

Efficient index-free SimRank similarity search in large graphs by discounting path lengths

SecureEngine: Spammer classification in cyber defence for leveraging green computing in Sustainable city

E-mail Spam Classification Using Grasshopper Optimization Algorithm and Neural Networks

An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined Approach

Artificial Intelligence and Edge Computing-Enabled Web Spam Detection for Next Generation IoT Applications

Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques

Blog Backlinks Malicious Domain Name Detection via Supervised Learning

A fuzzy Dempster–Shafer classifier for detecting Web spams

A Fuzzy-Based Approach to Enhance Cyber Defence Security for Next-Generation IoT

Using deep belief network to demote web spam

Detecting Web Spam Based on Novel Features from Web Page Source Code

GT2FS-SMOTE: An Intelligent Oversampling Approach Based Upon General Type-2 Fuzzy Sets to Detect Web Spam

Adaptive evaluation model of web spam based on link relation

Spams classification and their diffusibility prediction on Twitter through sentiment and topic models

Multi-Scale Anomaly Detection on Attributed Networks

An efficient deep learning-based scheme for web spam detection in IoT environment

CNN Based Malicious Website Detection by Invalidating Multiple Web Spams

A Novel Set of Contextual Features for Web Spam Detection

AATMS: An Anti-Attack Trust Management Scheme in VANET

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Web Spam Research Articles

Related Topics

Articles published on Web Spam

PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models

Efficient index-free SimRank similarity search in large graphs by discounting path lengths

SecureEngine: Spammer classification in cyber defence for leveraging green computing in Sustainable city

E-mail Spam Classification Using Grasshopper Optimization Algorithm and Neural Networks

An Improved Framework for Content- and Link-Based Web-Spam Detection: A Combined Approach

Artificial Intelligence and Edge Computing-Enabled Web Spam Detection for Next Generation IoT Applications

Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques

Blog Backlinks Malicious Domain Name Detection via Supervised Learning

A fuzzy Dempster–Shafer classifier for detecting Web spams

A Fuzzy-Based Approach to Enhance Cyber Defence Security for Next-Generation IoT

Using deep belief network to demote web spam

Detecting Web Spam Based on Novel Features from Web Page Source Code

GT2FS-SMOTE: An Intelligent Oversampling Approach Based Upon General Type-2 Fuzzy Sets to Detect Web Spam

Adaptive evaluation model of web spam based on link relation

Spams classification and their diffusibility prediction on Twitter through sentiment and topic models

Multi-Scale Anomaly Detection on Attributed Networks

An efficient deep learning-based scheme for web spam detection in IoT environment

CNN Based Malicious Website Detection by Invalidating Multiple Web Spams

A Novel Set of Contextual Features for Web Spam Detection

AATMS: An Anti-Attack Trust Management Scheme in VANET