Abstract

Since breast cancer is the most serious disease affecting women, early detection comes as a priority. The Wisconsin Breast Cancer Dataset (WBCD), which was retrieved from the UCI database, has been applied in numerous studies in recent years to help with the definitive diagnosis. Machine learning (ML) algorithms, such as K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Support Vector Machine (SVM), and Neural Network (NN) can be used to attain the upshot. Although these algorithms make predictions well, advantages cannot overshadow drawbacks because the outcome is circumstantial by the peculiar dataset itself and cannot draw a direct conclusion reflecting the deeper issue. To implement ML skills to figure out the factors that influence the prediction most in a statistical dimension, this paper uses the dataset above, compares five methods, and chooses three best classifiers: KNN, RF, and SVM. After selection, the author eliminates every single variable each time to get the accuracy, and compares them with the full model’s accuracy. Having controlled variables, it can be informed that Clump Thickness and Bare Nuclei are the factors that matter most.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.