A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data

Huijuan Lu,Huiyun Gao,Minchao Ye,Ke Yan,Xiuhui Wang

doi:10.1109/itme.2018.00015

Abstract

There are two key issues in the field of ensemble learning: (1) diversity of base classifiers; (2) the way of integrating multiple classifiers. In this paper, a special classifier structure, namely, decision group, is designed to increase the diversity of base classifier pool; and the genetic algorithm (GA) is used to assign weight to each base classifier, thus to improve the classification performance by avoiding local extremes. Overall, this work presents an ensemble classification algorithm based on AdaBoost. The base classifiers are decision groups composed by base classifiers, including K-nearest-neighbor (KNN), naive Bayes (NB) and decision tree (C4.5). Aiming at the characteristics of high dimensional and small samples of cancer gene expression data, a simple ensemble algorithm with decision groups composed of three base classifiers is proposed. Experimental results show that the proposed algorithm is superior to existing ensemble learning methods, such as Bagging, Random Forest (RF), Rotation Forest (RoF), AdaBoost, AdaBoost-BPNN, AdaBoost-SVM and AdaBoost-RF, and especially it has better performance on small sample and unbalanced gene expression data processing.

Full Text