Abstract

E-mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as spam emails. In this paper, a novel spam filtering framework (NSFF) is proposed, which is based on particle swarm optimization, fuzzy logic control, F-score and support vector machine (SVM). We propose a fuzzy adaptive particle swarm optimization (FAPSO) to find an optimal feature subset. In order to identify a subset of features embedded out of a large dataset which is contaminated with high dimensional noise, the proposed method is divided into three stages, namely core feature subset selection, feature subset selection and spam filtering. In the first stage, F-score is used to calculate the importance of each feature, and construct a core feature set, thus obtaining a number of core feature subsets. In the second stage, FAPSO is initialized from the core feature subset and adjusted adaptively via the fuzzy logic control, thereupon obtaining an optimal feature subset. In the final stage, support vector machine is employed as the classifier. According to the optimal feature subset, the input e-mails are classified via SVM. Three publicly available benchmark corpora for spam filtering, the PU1, Ling-Spam and Spam Assassin, are used in our experiments. The numerical results and statistical analysis show that the proposed approach is capable of finding an optimal feature subset from a large noisy data set. In addition, NSFF performs significantly better than the other methods in terms of prediction accuracy with smaller subset of features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call