Abstract

Many machine learning methods have been applied for short messaging service (SMS) spam detection, including traditional methods such as naïve Bayes (NB), vector space model (VSM), and support vector machine (SVM), and novel methods such as long short-term memory (LSTM) and the convolutional neural network (CNN). These methods are based on the well-known bag of words (BoW) model, which assumes documents are unordered collection of words. This assumption overlooks an important piece of information, i.e., word order. Moreover, the term frequency, which counts the number of occurrences of each word in SMS, is unable to distinguish the importance of words, due to the length limitation of SMS. This paper proposes a new method based on the discrete hidden Markov model (HMM) to use the word order information and to solve the low term frequency issue in SMS spam detection. The popularly adopted SMS spam dataset from the UCI machine learning repository is used for performance analysis of the proposed HMM method. The overall performance is compatible with deep learning by employing CNN and LSTM models. A Chinese SMS spam dataset with 2000 messages is used for further performance evaluation. Experiments show that the proposed HMM method is not language-sensitive and can identify spam with high accuracy on both datasets.

Highlights

  • Nowadays, one of the most popular and common communication services is the short message service, known as short messaging service (SMS)

  • This research uses the widely adopted UCI repository dataset for performance evaluations

  • This paper proposed a discrete hidden Markov model for SMS spam detection, and it is the first research taking advantage of word order information to detect spam SMS

Read more

Summary

Introduction

One of the most popular and common communication services is the short message service, known as SMS. SMS traffic volumes have risen from 1.46 billion in 2000 to 7.9 trillion in 2012 [1]. SMS-capable mobile phone users had reached 6.1 billion users by the year 2015 [2]. The growth of mobile users has generated a great deal of revenue [1]. Based on the latest statistics [3], global SMS revenue is predicted to hit 83.2 billion dollars in 2022 even though the revenue has continued to decrease after 2017. About half (43 billion dollars) of the global SMS revenue belongs to the global P2P (person-to-person) SMS messages market and the other half (40.2 billion dollars) belongs to

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call