Practical Web Spam Lifelong Machine Learning System with Automatic Adjustment to Current Lifecycle Phase

Marcin Luckner

doi:10.1155/2019/6587020

Abstract

Machine learning techniques are a standard approach in spam detection. Their quality depends on the quality of the learning set, and when the set is out of date, the quality of classification falls rapidly. The most popular public web spam dataset that can be used to train a spam detector—WEBSPAM-UK2007—is over ten years old. Therefore, there is a place for a lifelong machine learning system that can replace the detectors based on a static learning set. In this paper, we propose a novel web spam recognition system. The system automatically rebuilds the learning set to avoid classification based on outdated data. Using a built-in automatic selection of the active classifier the system very quickly attains productive accuracy despite a limited learning set. Moreover, the system automatically rebuilds the learning set using external data from spam traps and popular web services. A test on real data from Quora, Reddit, and Stack Overflow proved the high recognition quality. Both the obtained average accuracy and the F-measure were 0.98 and 0.96 for semiautomatic and full–automatic mode, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Security and Communication Networks	Publication Date: Feb 20, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Practical Web Spam Lifelong Machine Learning System with Automatic Adjustment to Current Lifecycle Phase

Abstract

Talk to us

Similar Papers

More From: Security and Communication Networks

Lead the way for us

Similar Papers

Stable web spam detection using features based on lexical items
Marcin Luckner ... Paweł Sobkowiak
Computers & Security | VOL. 46
Marcin Luckner, et. al.Marcin Luckner ... Paweł Sobkowiak
24 Jul 2014
Computers & Security | VOL. 46

Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques
Andronicus A Akinyelu
Journal of Computer Security | VOL. -
Andronicus A AkinyeluAndronicus A Akinyelu
25 Aug 2021
Journal of Computer Security | VOL. -

Lifelong Semi-supervised Learning for Information Extraction
Zhiyuan Chen ... Bing Liu
-
Zhiyuan Chen, et. al.Zhiyuan Chen ... Bing Liu
01 Jan 2017
01 Jan 2017

A Survey on Machine Learning Techniques for Cyber Security in the Last Decade
Kamran Shaukat ... Suhuai Luo
IEEE Access | VOL. 8
Kamran Shaukat, et. al.Kamran Shaukat ... Suhuai Luo
01 Jan 2020
IEEE Access | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Practical Web Spam Lifelong Machine Learning System with Automatic Adjustment to Current Lifecycle Phase

Abstract

Talk to us

Similar Papers

More From: Security and Communication Networks