Semantic-Based Feature Reduction Approach for E-mail Classification

Eman M Bahgat,Ibrahim F Moawad

doi:10.1007/978-3-319-48308-5_6

Abstract

E-mail is one of the most important applications for all the computer users due to its efficiency and low cost. However, some users use it in sending spam emails, which become a severe problem that has great effect on the users’ performance. E-mail filtering is an important approach to identify those spam emails. In this paper, based on different machine learning algorithms, a novel semantic-based approach for email filtering is proposed. The approach analyses the content of the email and assigns a weight to each term that can help in classifying it into spam or ham email. We enhanced the traditional Email filtering approaches by applying semantic-based feature reduction model using the WordNet ontology in order to handle the high dimensionality problem of feature size. The experiments that have been conducted using Enron dataset showed great results. A comparative study has also been presented among different classifiers that prove the efficiency of the proposed approach. These classifiers are Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression, J48 and Random Forest. The Logistic Regression classifier has the best accuracy with value of 0.96. Followed by the NB and SVM that almost have similar results of accuracy value 0.93. Finally, the Random Forest and J48 classifiers have the least accuracy values of 0.85 and 0.87 respectively.

Full Text