A weighted feature enhanced Hidden Markov Model for spam SMS filtering

Tian Xia,Xuemin Chen

doi:10.1016/j.neucom.2021.02.075

Abstract

Short message service (SMS) is a most favored communication service people use in daily life. However, this service is being misused by spammers. Rule based systems (RBS) and content based filtering (CBF) techniques have been developed to filter out spam messages. New rules can be easily added into RBS, but the throughput usually reduces as the rules increase. The bag-of-words (BoW) assumption based CBF techniques ignore the word order, which use machine learning methods to extract features from SMS message body according to word frequency and distribution. Striving to improve performance, researchers developed hybrid models that made algorithms ever-more complex. In addition, frequently conducting the time consuming models training and deployment forces the anti-spam industry still rely mainly on rule-based systems with unsolved throughput issue. A discrete Hidden Markov Model (HMM) was proposed in our previous study to address these issues, and the HMM method achieved a comparable performance to the deep learning methods. To further improve the performance of HMM method, we propose a new approach to weight and label words in SMS for formatting the observation sequence in HMM method. The weighted feature enhanced HMM achieves higher accuracy, and much faster training and filtering speed for meeting the anti-spam industry requirement. The performance comparison with other machine learning methods is conducted on the same open respiratory data set maintained by University of California, Irvine (UCI). Experimental results show that the weighted features enhanced HMM outperforms the LSTM (long short-term memory model) and close to CNN (convolutional neural network) in terms of classification accuracy. In addition, a Chinese SMS data set is used to further validate filtering accuracy and filtering speed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A weighted feature enhanced Hidden Markov Model for spam SMS filtering

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Mar 10, 2021
Citations: 28

Similar Papers

A Discrete Hidden Markov Model for SMS Spam Detection
Tian Xia ... Xuemin Chen
Applied Sciences | VOL. 10
Tian Xia, et. al.Tian Xia ... Xuemin Chen
21 Jul 2020
Applied Sciences | VOL. 10

Integrating different acoustic and syntactic language models in a continuous speech recognition system
Amparo Varona ... In Torres
-
Amparo Varona, et. al.Amparo Varona ... In Torres
16 Oct 2000
16 Oct 2000

Android-Based Short Message Service Filtering using Long Short-Term Memory Classification Model
M Laylul Mustagfirin ... Giri Wahyu Wiriasto
Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika | VOL. 8
M Laylul Mustagfirin, et. al.M Laylul Mustagfirin ... Giri Wahyu Wiriasto
30 Oct 2022
Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika | VOL. 8

A survey of feature selection methods for Gaussian mixture models and hidden Markov models
Stephen Adams ... Peter A Beling
Artificial Intelligence Review | VOL. 52
Stephen Adams, et. al.Stephen Adams ... Peter A Beling
25 Sep 2017
Artificial Intelligence Review | VOL. 52

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A weighted feature enhanced Hidden Markov Model for spam SMS filtering

Abstract

Talk to us

Similar Papers

More From: Neurocomputing