Abstract

A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

Highlights

  • Microarray technology has the ability to simultaneously measure expression levels of thousands of genes for a given biological sample, which is classified into one of the several categories

  • The feature similarity for both QPFS and our algorithm was measured by Pearson correlation

  • Our algorithm is denoted as ST-BIP for single task version and MT-BIP for multi-task version

Read more

Summary

Introduction

Microarray technology has the ability to simultaneously measure expression levels of thousands of genes for a given biological sample, which is classified into one of the several categories (e.g., cancer vs. control tissues). Each sample is represented by a feature vector of gene expressions obtained from a microarray. Using a set of microarray samples with known class labels, the goal is to learn a classifier able to classify a new tissue sample based on its microarray measurements. A typical microarray classification data set contains a limited number of labeled examples, ranging from only a few to several hundred. To reduce the risk of over-fitting, a typical strategy is to select a small number of features (i.e., genes) before learning a classification model. Feature selection [1,2] becomes an essential technique in microarray classification

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.