Abstract

Analyzing colon cancer data is essential for improving early detection, treatment outcomes, public health initiatives, research efforts, and overall patient care, ultimately leading to better outcomes and reduced burden associated with this disease. The prediction of any disease depends on the quality of the available dataset. Before applying the prediction algorithm it is important to analyze its characteristics. This research presented a comprehensive framework for addressing data imbalance in colon cancer datasets, which has been a significant challenge in previous studies in terms of imbalancing and high dimensionality for the prediction of colon cancer data. Both characters are important concepts of preprocessing. Imbalancing refers to the adjusting the data points in the proper portion of the class label. Feature selection is the process of selecting the strong feature from the available dataspace. This study aims to improve the performance of the popular tree, rule, lazy (KNN) classifiers and support vector machine algorithm(SVM) after addressing the imbalancing issue of data analysis and applying various feature selection methods such as Chi-Square, Symmetrical Uncertainty , CFS Subset and Classifier subset evaluators. The proposed research framework shows that after balancing the dataset, all the algorithms performed better with all applied feature selection methods. Out of all methods Jrip records 85.71% accuracy with Classifier subset evaluators, Ridor marks 84.52 % accuracy with CFS, J48 produces 83.33% accuracy with both CFS and Classifier subset evaluators, Simple Cart notices 84.52 % with Classifier subset evaluators, KNN records 91.66% accuracy with Chi and CFS and SVM produces 92.85% with Symmetrical Uncertainty.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.