Risk Factors of Cervical Cancer using Classification in Data Mining

Nazim Razali,Nurul Atieqah Ibrahim,Aida Mustapha,Mohd Helmy Abd Wahab,Salama A Mostafa

doi:10.1088/1742-6596/1529/2/022102

Abstract

According to World Health Organization, cervical cancer is the fourth most frequent cancer that have high mortality rate which affected women all around the world especially in low and middle-income countries. As the computer science and information technology field growth, researches on analysing medical datasets such as diabetes, cervical cancer and liver disease and etc also growth. This paper is set to studies classification techniques in data mining on risk factor of cervical cancer datasets. The clssification techniques such as Naive Bayes (NB), C4.5 Decision Tree (C4.5), k-Nearest Neighbors (kNN), Sequential Minimal Optimization (SMO), Random Forest Decision Tree (RF), Multilayer Perceptron (MLP) Neural Network and Simple Logistic Regression (SLR) have been used to classify the dataset whether healthy or cancer result for cervical cancer diagnostic. The dataset is needed to be undergoing intense data pre-processing phase due to imbalance and have a lot of missing value. The performance of classification were evaluated using 10-folds cross validation where accuracy, precision and recall as evaluation metric were measured using confusion matrix to determine the performance power for all classification techniques.

Full Text