Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier

Mohammad Nasir Abdullah,Nik Nur Fatin Fatihah Sapri,Bee Wah Yap,Wan Fairos Wan Yaacob

doi:10.1007/978-981-99-0741-0_24

Abstract

Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data.

Full Text