The Impact of Feature Extraction and Selection on SMS Spam Filtering

A K Uysal,S Ergin,S Gunal,E Sora Gunal

doi:10.5755/j01.eee.19.5.1829

A K Uysal, S Ergin + Show 2 more

Open Access

https://doi.org/10.5755/j01.eee.19.5.1829

Copy DOI

Abstract

This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language. DOI: http://dx.doi.org/10.5755/j01.eee.19.5.1829

Highlights

In recent years, Short Message Service (SMS) has become one of the most common communication methods due to rapid increase in the number of mobile phone users worldwide
Selection of BoW features were carried out using CHI2 and Gini index (GI) methods, where the number of selected features ranged from 1% to 100% of the entire BoW features
In case of Turkish messages, the highest Micro-F1 score was approximately 0.98. This score was obtained using SF2, and 50% of BoW features selected by CHI2, which were together applied on support vector machine (SVM) classifier

Summary

INTRODUCTION

Short Message Service (SMS) has become one of the most common communication methods due to rapid increase in the number of mobile phone users worldwide. A framework utilizing the content based filtering and challenge-response was introduced in [6] Another SMS anti-spam system combining behavior-based social network and temporal analysis was presented in [7]. In regard to the abovementioned studies, this paper extensively analyses the effects of several feature extraction and feature selection methods together on filtering SMS spam messages in two different languages, namely Turkish and English. The selected features are combined with the structural features, and fed into two distinct pattern classification algorithms, namely k-nearest neighbor and support vector machine, to classify SMS messages as either spam or legitimate.

DATASETS

FEATURE EXTRACTION

FEATURE SELECTION

CLASSIFICATION

EXPERIMENTAL WORK

Part B

Findings

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics and Electrical Engineering	Publication Date: May 15, 2013
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

The Impact of Feature Extraction and Selection on SMS Spam Filtering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics and Electrical Engineering

Lead the way for us

Similar Papers

A novel feature extraction approach in SMS spam filtering for mobile communication: one‐dimensional ternary patterns
Yilmaz Kaya ... Ömer Faruk Ertuğrul
Security and Communication Networks | VOL. 9
Yilmaz Kaya, et. al.Yilmaz Kaya ... Ömer Faruk Ertuğrul
19 Oct 2016
Security and Communication Networks | VOL. 9

Content-based SMS spam filtering based on the Scaled Conjugate Gradient backpropagation algorithm
Waddah Waheeb ... Mustafa Mat Deris
-
Waddah Waheeb, et. al.Waddah Waheeb ... Mustafa Mat Deris
01 Aug 2015
01 Aug 2015

A Framework for SMS Spam and Phishing Detection in Malay Language: a Case Study
...
International Review on Computers and Software (IRECOS) | VOL. 9
, et. al. ...
31 Jul 2014
International Review on Computers and Software (IRECOS) | VOL. 9

Comparative study on SMS spam message detection with different machine learning methods for safety communication
N Krishnamoorthy ... Seyedali Mirjalili
-
N Krishnamoorthy, et. al.N Krishnamoorthy ... Seyedali Mirjalili
07 Nov 2022
07 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Impact of Feature Extraction and Selection on SMS Spam Filtering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics and Electrical Engineering