Effective spam filter based on a hybrid method of header checking and content parsing

Ko‐Tsung Chu,Jyh‐Jian Sheu,Wei‐Pang Yang,Cheng‐Chi Lee,Hua‐Ting Hsu

doi:10.1049/iet-net.2019.0191

Abstract

In recent years, hazardous e-mails arose, such as the e-mails infected with ‘viruses’ or ‘worms’ spreading destructive programs and the ‘Phishing Mails’ defrauding e-mail accounts of the users. The number of spams continue to grow. With the related problems of spam coming to be more severe, the spam topics have become significant in various research domains. The common filtering methods include black/white list, rule learning, and those based on text classification, such as Naive Bayes, support vector machine, and boosting trees, multi-agent and genetic algorithm. Among these, the methods based on text classification are most widely applied. Moreover, some efficient methods were proposed to consider only the e-mail's header section, based on which both operating efficiency and classification efficiency could be improved. By applying machine learning technique and decision tree data mining algorithm C4.5, this study aims to propose an efficient spam filtering method with the following features: (i) proposing a two-phase filtering mechanism to scan mainly e-mail's header and auxiliary content. (ii) Reducing the problem of false positive. The experimental results show that the authors’ method has a considerably high accuracy rate of 98.76%. Compared with some other methods of using the same spam data sets or of deep learning-based, their method obviously has an excellent performance.

Full Text