Abstract

The ease of using Short Message Service (SMS) has brought the issue of SMS spam, characterized by unsolicited and unwanted. Many studies have been conducted utilizing machine learning methods to build models capable of classifying SMS Spam to overcome this problem. However, most of these studies still rely on traditional methods, with limited exploration of deep learning-based approaches. Whereas traditional methods have a limitation compared to deep learning, which performs manual feature extraction. Moreover, many of these studies only focus on binary classification rather than multiclass SMS classification which can provide more detailed classification results. The aim of this research is to analyze deep learning model for multiclass Indonesian SMS spam classification with six categories and to assess the effectiveness of the text augmentation method in addressing data imbalace issues arising from the increased number of SMS categories. The research method used were Indonesian version of Bidirectional Encoder Representations from Transformers (IndoBERT) model and exploratory data analysis (EDA) augmentation technique to address imbalance dataset issue. The evaluation is conducted by comparing the performance of the IndoBERT model on the dataset and applying EDA techniques to enhance the representation of minority classes. The result of this research shows that IndoBERT achieves 91% accuracy rate in classifying SMS spam. Furthermore, the use of EDA technique results in significant improvement in f1-score, with an average 12% increase in minority classes. Overall model accuracy also improves to 93% after EDA implementation. This research concludes that IndoBERT is effective for multiclass SMS spam classification, and the EDA is beneficial in handling imbalanced data, contributing to the enhancement of model performances.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.