Finding optimal classifiers for small feature sets in genomics and proteomics

Gregor Stiglic,Juan J Rodriguez,Peter Kokol

doi:10.1016/j.neucom.2010.02.024

Abstract

The classification of genomic and proteomic data in extremely high dimensional datasets is a well-known problem which requires appropriate classification techniques. Classification methods are usually combined with gene selection techniques to provide optimal classification conditions—i.e. a lower dimensional classification environment. Another reason for reducing the dimensionality of such datasets is their interpretability, as it is much easier to interpret a small set of ranked genes than 20 thousand genes. This paper evaluates the classification performance of Rotation Forest classifier on small subsets of ranked genes for two dataset collections consisting of 47 genomic and proteomic classification problems. Robustness and high classification accuracy is shown to be an important feature of Rotation Forest when applied to small sets of genes.

Full Text