Abstract

Automated spam detection and filtering is the task of categorizing Short Massage Services (SMS) into predefined category: Spam and Non-Spam, based on their content with the models learned from the training SMS dataset. This work evaluates some of the most widely used machine learning techniques- Decision Tree, Support Vector Machine (SVM) and Neural Networks- to address the automatic SMS filtering problem. To experiment the system, a Nepali SMS Corpus of 500 SMS (with 350 Non-Spam and 150 Spam) is collected manually with some existing SMS dataset. Classification and Regression Tree (CART) is used in Decision Trees, Linear and RBF kernels are used in SVM ad Back-propagation is used in Neural Network. To train these models, TF-IDF as well as other binary features are extracted from the preprocessed SMS corpus. The average empirical analysis shows that the Neural Network with Back-Propagation is outperforming the other three algorithms with the average classification accuracy of 85.75%. It is followed by SVM Linear with accuracy of 82.50%, Decision Trees with accuracy 77.15%. The least performing model was SVM with RBF kernel having accuracy 60.03%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.