Abstract
Cervical cancer is the fourth most commonly diagnosed cancer and one of the leading causes of cancer-related deaths among females worldwide. In this paper, we present an intelligent decision support system for the diagnosis of cervical cancer using risk factors outlined in a publicly available dataset. The dataset contains a large imbalance between positive and negative instances. Although sampling techniques may be utilized to address this, due to the high level of imbalance, oversampling or undersampling alone is insufficient to create an adequate balance between the classes, which is crucial for appropriate diagnosis. Hence, we propose a novel resampling technique that hybridizes oversampling and undersampling to induce a proper balance between the two classes. The hybrid strategy ensures that neither the majority class nor the minority class suffers from a reduction in performance or gets overfitted, as would be the case if oversampling or undersampling were used unilaterally. To further enhance the performance of the classifiers, Genetic Algorithm (GA) is applied to identify the key risk factors for cervical cancer diagnosis. Using the optimized feature set of only 8 features out of 32 procured by GA, the Random Forest classifier provided the maximum G-mean score of 94.47%, along with a sensitivity and specificity of 94.25% and 94.69%, respectively. Thus, our proposed hybrid resampling strategy effectively addresses class imbalance, while GA identifies the most important features to maximize the class separation, and the combination of the two provides the best possible performance for the diagnosis of cervical cancer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.