Abstract
As one of the most common types of cancer among women, breast cancer is a serious health concern worldwide. Early detection is crucial for successful treatment and improved survival rates. However, detecting breast cancer is challenging due to imbalanced classification, where the minority class (cancerous) is ominously smaller than the majority class (non-cancerous). In this paper, we explore the use of logistic regression (LR) and the adaptive synthetic resampling (ADASYN) technique to address imbalanced classification in breast cancer detection. To that end, we collected the Wisconsin Breast Cancer dataset, which contains 569 instances. The dataset is imbalanced, with 212 malignant (cancerous) cases and 357 benign (non-cancerous) cases. Then, we trained support vector machine, LR, K-nearest neighbor, gradient, and adaptive boosting on the imbalanced dataset. Finally, we trained these algorithms on resampled data with the ADASYN oversampling and we evaluated their performance using cross-validation score with 5-folds. The results of the experiment showed that using ADASYN with LR significantly improved the performance the LR model. The LR model achieves 99.46% accuracy on breast cancer diagnosis. Moreover, the confusion matrix shows that among the 188 samples, the model misclassified one cancerous instance. Thus, we concluded that the proposed model is effective for breast cancer diagnosis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.