Abstract

Cervical cancer is one of the most successful types of treatment when diagnosed early. In this study, it is aimed to find and classify the disease with data mining methods on the digitized data set obtained as a result of the pap-smear test. Two-stage architecture has been proposed for the diagnosis of cervical cancer. In the first stage of the study, missing data were extracted from the used dataset, and in the second stage, a new dataset was obtained by using the Synthetic Minority Oversampling Technique (SMOTE) algorithm to balance the target classes in the dataset. By applying the majority voting (MV) method to the dataset used in the study, the structure with 4 target variables was reduced to a single target variable. On two data sets, Artificial Neural Network (ANN), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), and K-Nearest Neighbors (KNN) algorithms from data mining methods were used for the diagnosis of cervical cancer. The results obtained from the original dataset and the dataset produced with Smote were compared. ANN is the best method evaluated according to classification success and F-score, and the major voted target variable in the balanced data group produced with the Smote algorithm gave the most successful result. The experimental results showed that the use of MV and SMOTE algorithms together increased the classification success from 93% to 99%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call