Abstract
A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
Highlights
Microarray technology has the ability to simultaneously measure expression levels of thousands of genes for a given biological sample, which is classified into one of the several categories
The feature similarity for both QPFS and our algorithm was measured by Pearson correlation
Our algorithm is denoted as ST-BIP for single task version and MT-BIP for multi-task version
Summary
Microarray technology has the ability to simultaneously measure expression levels of thousands of genes for a given biological sample, which is classified into one of the several categories (e.g., cancer vs. control tissues). Each sample is represented by a feature vector of gene expressions obtained from a microarray. Using a set of microarray samples with known class labels, the goal is to learn a classifier able to classify a new tissue sample based on its microarray measurements. A typical microarray classification data set contains a limited number of labeled examples, ranging from only a few to several hundred. To reduce the risk of over-fitting, a typical strategy is to select a small number of features (i.e., genes) before learning a classification model. Feature selection [1,2] becomes an essential technique in microarray classification
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.