Random ensemble oblique decision stumps for classifying gene expression data

Phuoc-Hai Huynh,Van Hoa Nguyen,Thanh-Nghi Do

doi:10.1145/3287921.3287987

Abstract

Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including k nearest neighbors (kNN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.

Full Text