Abstract

Cancer classification using microarray gene expression data is known to contain keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. However, classification gene expression data is a difficult task because these data are characterized by high dimensional space and small sample size. We investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data. Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model. Numerical test results on 50 very-high-dimensional microarray gene expression datasets from Kent Ridge Biomedical repository and Array Expression repositories show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including k nearest neighbors (kNN), SVM, decision trees and ensembles of decision trees like random forests, bagging and adaboost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call