Abstract

Identification of unsolicited emails or spam in a set of email files has become a challenging area of research. A robust classifier is not only appraised by performance accuracy but also false positive rate. Recently, Evolutionary algorithms and ensemble of classifiers methods have gained popularity in this domain. For developing an accurate and sensitive spam classifier, this research conducts a study of Evolutionary algorithm based classifiers i.e. Genetic Algorithm (GA) and Genetic Programming (GP) along with ensemble techniques. Two publicly available datasets (Enron and SpamAssassin) are used for testing, with the help of most informative features selected by Greedy Stepwise Search algorithm. Results show that without ensemble, GA performs better than GP but after an ensemble of many weak classifiers is developed, GP overshoots GA with significantly higher accuracy. Also, Greedy Stepwise Feature Search is found to be a strong method for feature selection in this application domain. Ensemble based GP turns out to be not only good in terms of classification accuracy but also in terms of low False Positive rates, which is considered to be an important criteria for building a robust spam classifier.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.