Abstract

Selecting a small number of discriminative genes from thousands is a fundamental task in microarray data analysis. An effective feature selection allows biologists to investigate only a subset of genes instead of the entire set, thus avoiding insignificant, noisy, and redundant features. This paper presents the MaskedPainter feature selection method for gene expression data. The proposed method measures the ability of each gene to classify samples belonging to different classes and ranks genes by computing an overlap score. A density based technique is exploited to smooth the effects of outliers in the overlap score computation. Analogously to other approaches, the number of selected genes can be set by the user. However, our algorithm may automatically detect the minimum set of genes that yields the best classification coverage of training set samples. The effectiveness of our approach has been demonstrated through an empirical study on public microarray datasets with different characteristics. Experimental results show that the proposed approach yields a higher classification accuracy with respect to widely used feature selection techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call