Evaluating the Efficiency of Vietnamese SMS Spam Detection Techniques

Vu Minh Tuan,Tran Quang Anh,Nguyen Xuan Thang

doi:10.54654/isj.v1i18.932

Abstract

Abstract— This paper is aimed at evaluating the efficiency of Vietnamese SMS spam detection methods on different variants of Vietnamese datasets by utilizing both traditional machine learning models and deep learning models. The researchers experimented with five algorithms, which were Support Vector Machine (SVM), Naive Bayes (NB), Random Forests (RF), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), on three different Vietnamese datasets. The findings reveal that the LSTM and CNN, supported by a transformer learning model - PhoBert, were more efficient than the traditional machine learning models. The LSTM model showed the highest accuracy of 97,77% when operating on the full-accent Vietnamese dataset. Similarly, the CNN model and PhoBert model showed the highest accuracy of 95,56% when dealing with non-diacritic Vietnamese dataset.

Full Text