A Comparative Study of Ensemble Learning Approaches in the Classification of Breast Cancer Metastasis

Wangshu Zhang,Rui Jiang,Xuegong Zhang,Xuebing Wu,Feng Zeng

doi:10.1109/ijcbs.2009.23

Abstract

The combined use of gene expression profiles and protein-protein interaction (PPI) networks has recently shed light on breast cancer research by selecting a small number of subnetworks as disease markers and then using them for the classification of metastasis. Based on previously identified subnetwork markers, we compare three ensemble learning approaches (AdaBoost, LogitBoost and random forest) with two widely used classifiers (logistic regression and support vector machine) in the classification of breast cancer metastasis. In leave-one-out cross-validation experiments on two breast cancer data sets, the ensemble learning methods can lead logis-tic regression and support vector machine by 22.4% and 4.8% respectively in terms of the classification accuracy. In cross data set validation experiments, the ensemble learning methods also demonstrate superior reproducibility over the other two methods. With these results, we infer that the ensemble learn-ing approaches with subnetwork markers might be more suit-able in handling the classification problem of breast cancer metastasis, and we recommend the use of these approaches in similar classification problems.

Full Text