Abstract

Classification is one of the ways to organize text so that the texts with the same contents can be grouped in the same category. One of the famous text classification methods is the Naive Bayes Method. Naive Bayes has efficient computation and good prediction result however the performance of Naive Bayes is not really good in classifying unbalanced dataset. This Naive Bayes method is then modified to overcome the weakness, this modified method is then known as Transformed Complement Naive Bayes (TCNB) method. In this research, TCNB method was used to the spam e-mails whose dataset were unbalanced and were consisted of 481 dataset in spam e-mail class, and 2412 dataset in legitimate e-mail class (in total, there are 2893 dataset). The classification was done with and without cross validation. The classification with cross validation was done starting from k=2 until k=10. The classification without cross validation was done by dividing the training data by 80% and testing data by 20%. The result showed that the classification by using TCNB with cross validation had its best accuracy level on k=10 by 93,917% and the classification without cross validation had its best accuracy by 92,760%. Thus it can be concluded that TCNB can handle unbalanced dataset with good prediction accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call