A bert model for sms and twitter spam ham classification and comparative study of machine learning and deep learning technique

Sarika S Raga,Chaitra B L

doi:10.1109/icraie56454.2022.10054285

Abstract

With the high popularity of online social networks and with the prosperity of the SMS and Tweets, the increasing number of spam messages has become a severe problem and spammers find these platforms easily approachable to trick users in to spiteful and malicious activities by posting spam messages. The need to block spam messages requires us to develop new spam detection technologies. In our paper we build a spam detector prototype using BERT (Bidirectional encoder and representations for transformer) pre-trained model that classifies messages by understanding their actual meaning and context, and we trained our spam detector model with SMS dataset named V.1 dataset and UtkMl’s Twitter dataset. To test the performance and evaluate our model we used precision, recall and F measure metrics. A comparative study of different machine learning and Deep Learning algorithms and BERT is performed. Here BERT outperforms all other algorithms and is able to achieve an AUC, F1 score and Accuracy of 96.10%,92 and 91.71% respectively.

Full Text