Feature Selection in Cancer Classification from mRNA Data Based on Localized Dimension Reduction

Rami N Mahdi,Eric C Rouchka

doi:10.1109/icmla.2009.81

Abstract

In response to the ICMLA 2009 Functional Clustering of Gene Expression Profiles in Human Cancers Challenge, we present a new dimension reduction approach that ranks features based on their localized discriminative power. The proposed method is based on a localized dimension reduction penalty added to the objective function for training a hyper basis function (hyper BF or generalized RBF) neural network. The localized dimension reduction in a hyper BF network is motivated by the fact that many patterns are recognized to belong to one class due to the co-occurrence of specific values along a specific set of dimensions. Furthermore, one class is very likely to be composed of multiple sub- models that are different in their characteristics. The proposed method is applied at multiple iterations. Features are ranked each iteration and a smaller subset is passed to the next iteration. A support vector machine (SVM) is used to evaluate the remaining dimensions in a ten-fold cross validation setting. Experimental results show the proposed method effectively reduces the number of dimensions from 54,613 to 65 while increasing the cross-validation classification accuracy from 92.3% to 96.4%. The classification accuracy of the final model on unseen data was reported by the challenge coordinators to be 94% accurate.

Full Text