Abstract

Because of Breast Cancer's high mortality rate and being a leading cause of death among women worldwide, there has been importance given to machine learning (ML) algorithms to detect early signs of benign and malignant tumors effectively. Assistance from ML classifiers allows for a more efficient evaluation of mammographic results, surpassing the capabilities of radiologists who manually classify extensive patient data. This study aims to evaluate the effectiveness of the k-Nearest Neighbor (kNN) classifier in characterizing cancer tumor stages based on concavity, texture, area, perimeter, and smoothness. We employ scatterplots to differentiate between benign and malignant classes using the Breast Cancer Wisconsin Dataset (WBCD) from the University of California at Irvine Machine Learning Repository. Employing the k-Fold Cross Validation (k-FCV) technique, we determine the optimal value for k to assign anonymous data to their respective categories. The analysis conducted in this study finds that the most favorable value for the hyperparameter k is 12, resulting in a highly effective diagnostic outcome from administering four distinct tests. Given the absence of a predefined value for the k parameter, guesswork could lead to accuracy errors and misdiagnosis; therefore, employing k-FCV provides a more precise approach to determining the optimal class for unknown tumor attributes. Additionally, preprocessing of this dataset and measuring how different data splits impact accuracy are used to organize the data effectively and achieve reliable results. Recognizing that early detection is essential in preventing Breast Cancer-related deaths, ML techniques like kNN can greatly reduce mortality rates associated with the disease.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call