Abstract

In this time, one of the most and fastest forms of communication is electronic mail or what we call e-mail. However, the increase of e-mail users has resulted in the dramatic increase of spam emails in the past few years. Spam is the use of electronic messaging systems to send bulk data. In this paper, e-mail data were classified as ham email and spam email using supervised learning algorithms. Three different classifiers such as Naive Bayesian (NB) classifier, K-nearest neighbor (KNN) classifier and Support Vector Machine (SVM) classifier were used. The experiment was performed by applying filtering on the classifiers. The result shows the difference between the classifier before and after applying filtering algorithm. To examine the performance of the selected classification methods or algorithms, namely Naive Bayes, SVM and KNN, true positive, false positive, precision, recall and F-measure were validated. There was a time difference using those classification algorithms. KNN and SMO algorithms are almost the best classifiers among the three before applying filtering algorithm. Sequential minimal optimization (SMO) is an algorithm used to solve quadratic programming (QP) problem that arises during the training of support vector machines (SVM) and after applying filtering algorithm. SMO algorithm is the best classifier algorithm. For this experiment, the data mining tool called WEKA was used. Key words: WEKA, classifier, K-nearest neighbor (KNN), support vector machines (SVM), Naive Bayesian (NB), boosting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call