Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

Keerthana Rajendran,Vinesh Thiruchelvam,Manoj Jayabalan

doi:10.14569/ijacsa.2020.0110808

Abstract

A widespread global health concern among women is the incidence of the second most leading cause of fatality which is breast cancer. Predicting the occurrence of breast cancer based on the risk factors will pave the way to an early diagnosis and an efficient treatment in a quicker time. Although there are many predictive models developed for breast cancer in the past, most of these models are generated from highly imbalanced data. The imbalanced data is usually biased towards the majority class but in cancer diagnosis, it is crucial to diagnose the patients with cancer correctly which are oftentimes the minority class. This study attempts to apply three different class balancing techniques namely oversampling (Synthetic Minority Oversampling Technique (SMOTE)), undersampling (SpreadSubsample) and a hybrid method (SMOTE and SpreadSubsample) on the Breast Cancer Surveillance Consortium (BCSC) dataset before constructing the supervised learning methods. The algorithms employed in this study include Naive Bayes, Bayesian Network, Random Forest and Decision Tree (C4.5). The balancing method which yields the best performance across all the four classifiers were tested using the validation data to determine the final predictive model. The performances of the classifiers were evaluated using a Receiver Operating Characteristic (ROC) curve, sensitivity, and specificity.

Highlights

The World Health Organization reported in 2018 that there were 627,000 deaths worldwide due to breast cancer [1]
The class balancing methods were applied to the training dataset which consists of 180,465 instances
This study was conducted using the Breast Cancer Surveillance Consortium (BCSC) dataset which consisted of 280,660 screening mammography results and demographic profiles of breast cancer patients who are women aged 35 years and above

Summary

Introduction

The World Health Organization reported in 2018 that there were 627,000 deaths worldwide due to breast cancer [1]. Breast cancer is the second most common cancer death among women, especially in developing countries [2]. This cancer type accounts for 25% of all cancers among women and affects 10% of women globally at some stage of their life [3]. This is a more common issue in developing countries where the mortality rate is greater due to the prohibitive cost incurred for extensive diagnostic tests and treatments required to treat breast cancer completely [4]. The multitude of diagnoses carried out to assess the cancer stage require an extended period for the clinicians to obtain medical results

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2020
Citations: 14	License type: cc-by

R Discovery Prime

R Discovery Prime

Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Similar Papers

K-NEAREST NEIGHBOR DENGAN ADAPTIVE BOOSTING DAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE UNTUK KLASIFIKASI DATA TIDAK SEIMBANG
Ria Sulistyo Yuliani ... Rukun Santoso
Jurnal Gaussian | VOL. 12
Ria Sulistyo Yuliani, et. al.Ria Sulistyo Yuliani ... Rukun Santoso
28 Jul 2023
Jurnal Gaussian | VOL. 12

Modeling of class imbalance using an empirical approach with spambase dataset and random forest classification
Kiranmayi Kotipalli ... Shan Suthaharan
-
Kiranmayi Kotipalli, et. al.Kiranmayi Kotipalli ... Shan Suthaharan
13 Oct 2014
13 Oct 2014

SMOTE-LOF for noise identification in imbalanced data classification
Asniar ... Kridanto Surendro
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Asniar, et. al. Asniar ... Kridanto Surendro
09 Feb 2021
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Design of Istitaah Classification System Based on Machine Learning Using Imbalanced Dataset
Nuqson Masykur Huda ... Albarda
-
Nuqson Masykur Huda, et. al.Nuqson Masykur Huda ... Albarda
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Breast Cancer via Supervised Machine Learning Methods on Class Imbalanced Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications