Abstract

<span lang="EN-US">Data imbalance is one of the problems in the application of machine learning and data mining. Often this data imbalance occurs in the most essential and needed case entities. Two approaches to overcome this problem are the data level approach and the algorithm approach. This study aims to get the best model using the pap smear dataset that combined data levels with an algorithmic approach to solve data imbalanced. The laboratory data mostly have few data and imbalance. Almost in every case, the minor entities are the most important and needed. Over-sampling as a data level approach used in this study is the synthetic minority oversampling technique-nominal (SMOTE-N) and adaptive synthetic-nominal (ADASYN-N) algorithms. The algorithm approach used in this study is the ensemble classifier using AdaBoost and bagging with the classification and regression tree (CART) as learner-based. The best model obtained from the experimental results in accuracy, precision, recall, and f-measure using ADASYN-N and AdaBoost-CART.</span>

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call