Classification of Spam Mail using different machine learning algorithms

Aditya Shrivastava,Rachana Dubey

doi:10.1109/icacat.2018.8933787

Abstract

Email is necessary and essential for communication in today's life. Today internet users are increases, and email is necessary for communication over the internet. Spam mail is a major and big problem of researchers to analyze and reduce it. Spam emails are received in bulk amount and it contains trojans, viruses, malware and causes phishing attacks. Problems are arise when number of unwanted mails are come from unknown sites and how to classify the user that email are received which is spam email or ham. This paper used to classify that incoming emails are spam mail or ham by the use of different classification techniques to identify spam mail and remove it. Naive bayes classifier are apply in the concept of posterior probability and decision tree algorithms are apply namely Random Tree, REPTree, Random Forest,and J48 decision tree classifier. For the identification of spam mail, UCI spambase dataset is used. It is a benchmark dataset which contains 58 attributes and 4601 instances. Weka software is used for the analysis and implementation of results. In Weka tool, classification algorithms are used to find spam mail in the classification phase of weka software.These papers play a very important role to remove viruses, trojans, malware and websites including phishing attacks and fraudulent attempts in emails. Feature selection is applied on dataset for training set and cross validation. Cfs Subset evaluation method is used for best first method in feature selection. For the classification of spam mail, we use two tests are cross validation and training set under classifier option in Weka Tool. For training set, same data will be used for training and testing. And for cross validation, training data is segmented in a number of folds. And finally using training set, Random Tree gives the best result for the classification of spam mail.

Full Text