Abstract

Email is a common communication technology in modern life. The more emails we receive, the more difficult and time consuming it is to sort them out. One solution to overcome this problem is to create a system using machine learning to sort emails. Each method of machine learning and data sampling result in different performance. Ensemble learning is a method of combining several learning models into one model to get better performance. In this study we tried to create a multiclass email classification system by combining learning models, data sampling, and several data classes to obtain the effect of Ensemble Bagging and Ensemble Voting methods on the performance of the macro average f1 score, and compare it with non-ensemble models. The results of this study show that the sensitivity of Naïve Bayes to imbalance data is helped by the Ensemble Bagging and Ensemble Voting method with ∆P (delta performance) of range 0.0001 – 0.0018. Logistic Regression has performance with Ensemble Bagging and Ensemble Voting by ∆P of range 0.0001-0.00015. Decision Tree has lowest performance compared to others with ∆P of -0.01

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call