Nepali SMS filtering Using Decision Trees, Neural Network and Support Vector Machine

Tej Bahadur Shahi,Subarna Shakya

doi:10.1109/icacccn.2018.8748286

Abstract

Automated spam detection and filtering is the task of categorizing Short Massage Services (SMS) into predefined category: Spam and Non-Spam, based on their content with the models learned from the training SMS dataset. This work evaluates some of the most widely used machine learning techniques- Decision Tree, Support Vector Machine (SVM) and Neural Networks- to address the automatic SMS filtering problem. To experiment the system, a Nepali SMS Corpus of 500 SMS (with 350 Non-Spam and 150 Spam) is collected manually with some existing SMS dataset. Classification and Regression Tree (CART) is used in Decision Trees, Linear and RBF kernels are used in SVM ad Back-propagation is used in Neural Network. To train these models, TF-IDF as well as other binary features are extracted from the preprocessed SMS corpus. The average empirical analysis shows that the Neural Network with Back-Propagation is outperforming the other three algorithms with the average classification accuracy of 85.75%. It is followed by SVM Linear with accuracy of 82.50%, Decision Trees with accuracy 77.15%. The least performing model was SVM with RBF kernel having accuracy 60.03%.

Full Text