A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model

Jialin Ma ,Yongjun Zhang ,Zhijian Wang ,Bolun Chen

doi:10.3837/tiis.2018.02.004

Abstract

Nowadays, SMS spam has been overflowing in many countries. In fact, the standards of filtering SMS spam are different from country to country. However, the current technologies and researches about SMS spam filtering all focus on dividing SMS message into two classes: legitimate and illegitimate. It does not conform to the actual situation and need. Furthermore, they are facing several difficulties, such as: (1) High quality and large-scale SMS spam corpus is very scarce, fine categorized SMS spam corpus is even none at all. This seriously handicaps the researchers’ studies. (2) The limited length of SMS messages lead to lack of enough features. These factors seriously degrade the performance of the traditional classifiers (such as SVM, K-NN, and Bayes). In this paper, we present a new fine categorized SMS spam corpus which is unique and the largest one as far as we know. In addition, we propose a classifier, which is based on the probability topic model. The classifier can alleviate feature sparse problem in the task of SMS spam filtering. Moreover, we compare the approach with three typical classifiers on the new SMS spam corpus. The experimental results show that the proposed approach is more effective for the task of SMS spam filtering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems

Lead the way for us

Journal: KSII Transactions on Internet and Information Systems	Publication Date: Feb 28, 2018
Citations: 1

Similar Papers

Intelligent SMS Spam Filtering Using Topic Model
Jialin Ma ... Jinling Liu
-
Jialin Ma, et. al.Jialin Ma ... Jinling Liu
01 Sep 2016
01 Sep 2016

Semi-supervised novelty detection with one class SVM for SMS spam detection
Suleiman Y Yerima ... Abul Bashar
-
Suleiman Y Yerima, et. al.Suleiman Y Yerima ... Abul Bashar
01 Jun 2022
01 Jun 2022

Analisis Klasifikasi SMS Spam Menggunakan Logistic Regression
Ferin Reviantika Suprihati
Jurnal Sistem Cerdas | VOL. 4
Ferin Reviantika SuprihatiFerin Reviantika Suprihati
28 Dec 2021
Jurnal Sistem Cerdas | VOL. 4

Topic evolution based on the probabilistic topic model: a review
Houkui Zhou ... Huimin Yu
Frontiers of Computer Science | VOL. 11
Houkui Zhou, et. al.Houkui Zhou ... Huimin Yu
04 May 2017
Frontiers of Computer Science | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model

Abstract

Talk to us

Similar Papers

More From: KSII Transactions on Internet and Information Systems