Abstract

Cervical cancer has recently emerged as the leading cause of premature death among women. Around 85% of cervical cancer cases occur in underdeveloped countries. There are several risk factors associated with cervical cancer. This study describes a novel predictive model that uses early screening and risk trends from individual health records to forecast cervical cancer patients' prognoses. This study uses machine learning classification techniques to investigate the risk factors for cervical cancer. Additionally, use the voting method to evaluate all models and select the most appropriate model. The dataset used in this study contains missing values and shows a significant imbalance. Thus, the Random Oversampling technique was used as a sampling method. We used Principal Component Analysis (PCA) and XGBoost feature selection techniques to determine the most important features. To predict the accuracy, we used several machine learning classifiers, including Support Vector Machines (SVM), Random Forest (RF), k-nearest Neighbors (KNN), Decision Trees (DT), Naive Bayes (NB), Logistic Regression (LR), AdaBoost (AdB), Gradient Boosting (GB), Multilayer Perceptron (MLP), and Nearest Centroid Classifier (NCC). To demonstrate the efficacy of the suggested model, a comparison of its accuracy, sensitivity, and specificity was performed. We used the Random Oversampling approach along with the Ensemble ML method, hard voting on RF and MLP, and achieved 99.19% accuracy. It is demonstrated that the ensemble ML classifier (hard voting) performs better at handling classification problems when features are decreased and the high-class imbalance problem is handled.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.