Abstract
A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naive Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.
Highlights
The World Health Organization reported in 2018 that there were 627,000 deaths worldwide due to breast cancer [1]
The class balancing methods were applied to the training dataset which consists of 180,465 instances
This study was conducted using the Breast Cancer Surveillance Consortium (BCSC) dataset which consisted of 280,660 screening mammography results and demographic profiles of breast cancer patients who are women aged 35 years and above
Summary
The World Health Organization reported in 2018 that there were 627,000 deaths worldwide due to breast cancer [1]. Breast cancer is the second most common cancer death among women, especially in developing countries [2]. This cancer type accounts for 25% of all cancers among women and affects 10% of women globally at some stage of their life [3]. This is a more common issue in developing countries where the mortality rate is greater due to the prohibitive cost incurred for extensive diagnostic tests and treatments required to treat breast cancer completely [4]. The multitude of diagnoses carried out to assess the cancer stage require an extended period for the clinicians to obtain medical results
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.