Abstract
As one of the most common types of cancer among women, breast cancer is a serious health concern worldwide. Early detection is crucial for successful treatment and improved survival rates. However, detecting breast cancer is challenging due to imbalanced classification, where the minority class (cancerous) is ominously smaller than the majority class (non-cancerous). In this paper, we explore the use of logistic regression (LR) and the adaptive synthetic resampling (ADASYN) technique to address imbalanced classification in breast cancer detection. To that end, we collected the Wisconsin Breast Cancer dataset, which contains 569 instances. The dataset is imbalanced, with 212 malignant (cancerous) cases and 357 benign (non-cancerous) cases. Then, we trained support vector machine, LR, K-nearest neighbor, gradient, and adaptive boosting on the imbalanced dataset. Finally, we trained these algorithms on resampled data with the ADASYN oversampling and we evaluated their performance using cross-validation score with 5-folds. The results of the experiment showed that using ADASYN with LR significantly improved the performance the LR model. The LR model achieves 99.46% accuracy on breast cancer diagnosis. Moreover, the confusion matrix shows that among the 188 samples, the model misclassified one cancerous instance. Thus, we concluded that the proposed model is effective for breast cancer diagnosis.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have