Abstract

BackgroundBreast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study.MethodsTwo well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results.ResultsExperimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied.ConclusionsLR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.

Highlights

  • Breast cancer is one of the most critical cancers and is a major cause of cancer death among women

  • Efficiency of all techniques To show that SMOTE, classifier technique (CSC), under-sampling, bagging and AdaboostM1 can improve the predictive performance of the original models, we input nine prognosis variables for S_DT_9, S_LR_9, C_DT_9, C_LR_9, U_DT_9, U_LR_9, Ba_DT_9, Ba_LR_9, Ad_DT_9, and Ad_LR_9 model constructions

  • We find that the specificity decreases slightly when applying SMOTE, CSC, and under-sampling, the sensitivity and g-mean are improved; while AUC values indicate that the performance of decision tree (DT) and logistic regression (LR) when applying SMOTE and AdaboostM1 are slightly decreased

Read more

Summary

Introduction

Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. The breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. The need to monitor the survivability of breast cancer patients is threefold. Breast cancer is one of the most critical cancers [1] and is a major cause of cancer death among women. DeSantis et al [2] reported that in 2011, around 230,480 American women were diagnosed with invasive breast cancer and 39,520 breast cancer Sources

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.