Trigonometric words ranking model for spam message classification

Suha Mohammed Hadi,Ali Hakem Alsaeedi,Mustafa Musa Jaber,Karrar Hameed Abdulkareem,Riyadh Rahef Nuiaa,Zaid Abdi Alkareem Alyasseri,Mazin Abed Mohammed,Dhiah Al‐Shammary

doi:10.1049/ntw2.12063

Abstract

AbstractThe significant increase in the volume of fake (spam) messages has led to an urgent need to develop and implement a robust anti‐spam method. Several of the current anti‐spam systems depend mainly on the word order of the message in determining the spam message, which results in the system's inability to predict the correct type of message when the word order changes. In this paper, a new framework is proposed for anti‐spam filtering that does not depend on the word's position in the message, called the Trigonometric Words Ranking Model (TWRM). The proposed TWRM is based on restricting spammers over the network by measuring a theta angle, which is a relationship between message weight and spam. TWRM classifies messages by calculating the rank of each word that places the corresponding message in the correct class. The rank of words is derived from their frequency in the entire data category. The proposed method is applied to three datasets of spam messages: UCI spam email, Enron spam, and TREC spam data. The proposed model is proven as more efficient than the Minhash and vector space models. Moreover, the TWRM performance provided better retrieval time and defence, which is reflected in the accuracy of (99.64%), which is higher than that of Minhash (88.79%) and vector space (92.59%).

Full Text