Robustness analysis of diversified ensemble decision tree algorithms for Microarray data classification

Grant Daggard Grant Daggard,Jiu-Yong Li Jiu-Yong Li,Hong Hu Hong Hu,Li-Zhen Wang Li-Zhen Wang,Hua Wang Hua Wang

doi:10.1109/icmlc.2008.4620389

Grant Daggard Grant Daggard, Jiu-Yong Li Jiu-Yong Li + Show 3 more

Open Access

https://doi.org/10.1109/icmlc.2008.4620389

Copy DOI

Abstract

Ensemble classification methods have shown promise for achieving higher classification accuracy for microarray data classification analysis. As noise values do exist in all microarray data even after microarray data preprocessing stage, robustness is therefore another very important criteria in addition to accuracy for evaluating reliable microarray classification algorithms. In this paper, we conduct experimental comparison of our newly developed MDMT with C4.5, BaggingC4.5, Ad-aBoostingC4.5, Random Forest and CS4 on four microarray cancer data sets. We test and evaluate how well a given single or ensemble classifier can tolerate noise data in unseen test datasets, particularly with increasing levels of noise. The experimental results show that MDMT tolerates the noise values in unseen test data sets better than other compared methods do, particularly with increasing levels of noise data. We observe that a random forests is comparable to MDMT in term of resistance to noise. The experimental results also show that ensemble decision tree methods tolerate the noise values better than single tree C4.5 does. We conclude that avoiding overlapping genes exist among the ensemble trees is an intuitive, simple and effective way to achieve higher degree of diversity for ensemble decision tree methods. The algorithm based on this principal is more reliable to deal with microarray data sets with certain level of noise data.

Full Text