Abstract

This research focused on using machine learning methods for breast cancer diagnosis, considering that breast cancer is the scariest disease for women because it can cause mortality. Not only that, but there is also an increase in breast cancer death rates in women yearly. Early prediction is the right solution to increase life expectancy and reduce mortality rates caused by breast cancer. However, breast cancer data has a problem, namely that the data is imbalanced, which harms the performance of the machine learning method itself. In the data, breast cancer had a Benign class (357 instances) more than the Malignant class (212 instances). Therefore, this study aimed to solve the problem of imbalanced data using the Smote variants and Random Forest approaches in breast cancer classification. The results of this study showed that the Smote approach with Random Forest had the best performance compared to Borderline Smote and Random Forest in the case of breast cancer data classification, where Smote with Random Forest produced an accuracy of 97.3%, sensitivity of 96.9%, and specificity of 97.8%. In comparison, Borderline Smote with Random Forest produced an accuracy of 96.4%, sensitivity of 95.6%, and specificity of 96.9%. The results of this study can contribute to predicting breast cancer using the proposed method, because it has been proven to have high accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call