Abstract

Over time, technological advancements have had an immense effect on every aspect of life, including travel, office work, music, healthcare, and communication. In the past, people communicated using telephone lines. With far more functionality than telephone cable technology, wireless technology already prevails. SMS is mostly used by spammers and advertising firms to communicate with the general public and distribute company pamphlets. This explains why over 60% of spam SMS are sent and received every day. Although these spam communications irritate users and occasionally con unsuspecting users, the spammers and ad businesses benefit handsomely from them. This paper suggested a method for categorizing ham and spam SMS using supervised machine learning approaches. Features are extracted from data using feature extraction techniques like bag-of- words and Term Frequency-Inverse Document Frequency (TF-IDF). The imbalance in the SMS dataset we used was addressed by applying both oversampling and under sampling techniques. The support vector classifier, gradient boosting machine, random forest, Gaussian Naive Bayes, and logistics regression are implemented on the using spam SMS and ham SMS data sets, evaluated by F1 score, accuracy, precision and recall are used to assess performance. According to the experiment's findings, the random forest diagnoses spam and ham SMS more precisely-99% of the time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.