A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

Tian Xia

doi:10.1109/access.2020.2991328

Abstract

Along with the barbarous growth of spams, anti-spam technologies including rule-based approaches and machine-learning thrive rapidly as well. In antispam industry, the rule-based systems (RBS) becomes the most prominent methods for fighting spam due to its capability to enrich and update rules remotely. However, the antispam filtering throughput is always a great challenge of RBS. Especially, the explosively spreading of obfuscated words leads to frequent rule update and extensive rule vocabulary expansion. These incremental obfuscated words make the filtering speed slow down and the throughput decrease. This paper addresses the challenging throughput issue and proposes a constant time complexity rule-based spam detection algorithm. The algorithm has a constant processing speed, which is independent of rule and its vocabulary size. A new special data structure, namely, Hash Forest, and a rule encoding method are developed to make constant time complexity possible. Instead of traversing each spam term in rules, the proposed algorithm manages to detect spam terms by checking a very small portion of all terms. The experiment results show effectiveness of proposed algorithm.

Highlights

The widespread use of Internet had grown explosively since the first establishment of Internet in 1969
If the time complexity of filtering algorithms of rule-based systems (RBS) can reduce to constant, the throughput issue can be solved since the expansion of rule and its vocabulary size will not slow down filtering speed ever
EXPERIMENTAL RESULTS The experiment is based on production environmental data of the short messages (SMS) service company cooperated with us mentioned in Introduction

Summary

INTRODUCTION

The widespread use of Internet had grown explosively since the first establishment of Internet in 1969. The scale of data is overwhelmingly increased as well [1], especially after the wide use of social networks, personal communication tools, emails and short messages (SMS) This easy-communication circumstance encouraged the numerous emerge of spams. If the time complexity of filtering algorithms of RBS can reduce to constant, the throughput issue can be solved since the expansion of rule and its vocabulary size will not slow down filtering speed ever. The project was carried out to increase filtering speed to meet its overwhelming SMS sending throughput requirement This project successfully addressed the throughput issue and decreased the time complexity of the spam detection algorithm to constant O(1). The encoding method helps the filter calculate operators in expressions automatically

RELATED WORK

AN ADDITIONAL FEATURE

CONCLUSION AND FUTURE WORK

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Validating Rule-based Algorithms

Acta Polytechnica Hungarica | VOL. 12

30 Jul 2015
Acta Polytechnica Hungarica | VOL. 12

Study on the composition rules for Chinese Jiangnan ditty
Xin Wang ... Yuanzhong Wang
-
Xin Wang, et. al.Xin Wang ... Yuanzhong Wang
01 Apr 2015
01 Apr 2015

Web-based rule-based system for early detection of anemia among pregnant mothers
S Y Veronica ... M N Widyawati
IOP Conference Series: Materials Science and Engineering | VOL. 1108
S Y Veronica, et. al.S Y Veronica ... M N Widyawati
01 Mar 2021
IOP Conference Series: Materials Science and Engineering | VOL. 1108

A 3-phase combined wheel slip and acceleration threshold algorithm for anti-lock braking in heavy commercial road vehicles
Akhil Challa ... Sriram Sivaram
Vehicle System Dynamics | VOL. 60
Akhil Challa, et. al.Akhil Challa ... Sriram Sivaram
22 Mar 2021
Vehicle System Dynamics | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access