Cervical cancer is one of the deadliest diseases in women worldwide. It is caused by long-term infection of the skin cells and mucosal cells of the genital area of women. The most disturbing thing about this cancer is the fact that it does not show any symptoms when it occurs. In the diagnosis and prognosis of cervical cancer disease, machine learning has the potential to help detect it at an early stage. In this paper, we analyzed different supervised machine learning techniques to detect cervical cancer at an early stage. To train the machine learning model, a cervical cancer dataset from the UCI repository was used. The different methods were evaluated using this dataset of 858 cervical cancer patients with 36 risk factors and one outcome variable. Six classification algorithms were applied in this study, including an artificial neural network, a Bayesian network, an SVM, a random tree, a logistic tree, and an XG-boost tree. All models were trained with and without a feature selection algorithm to compare the performance and accuracy of the classifiers. Three feature selection algorithms were used, namely (i) relief rank, (ii) wrapper method and (iii) LASSO regression. The maximum accuracy of 94.94% was recorded using XG Boost with complete features. It is also observed that for this dataset, in some cases, the feature selection algorithm performs better. Machine learning has been shown to have advantages over traditional statistical models when it comes to dealing with the complexity of large-scale data and uncovering prognostic features. It offers much potential for clinical use and for improving the treatment of cervical cancer. However, the limitations of prediction studies and models, such as simplified, incomplete information, overfitting, and lack of interpretability, suggest that further efforts are needed to improve the accuracy, reliability, and practicality of clinical outcome prediction.
Read full abstract