Abstract

Since breast cancer is the most serious disease affecting women, early detection comes as a priority. The Wisconsin Breast Cancer Dataset (WBCD), which was retrieved from the UCI database, has been applied in numerous studies in recent years to help with the definitive diagnosis. Machine learning (ML) algorithms, such as K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Support Vector Machine (SVM), and Neural Network (NN) can be used to attain the upshot. Although these algorithms make predictions well, advantages cannot overshadow drawbacks because the outcome is circumstantial by the peculiar dataset itself and cannot draw a direct conclusion reflecting the deeper issue. To implement ML skills to figure out the factors that influence the prediction most in a statistical dimension, this paper uses the dataset above, compares five methods, and chooses three best classifiers: KNN, RF, and SVM. After selection, the author eliminates every single variable each time to get the accuracy, and compares them with the full model’s accuracy. Having controlled variables, it can be informed that Clump Thickness and Bare Nuclei are the factors that matter most.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call