Abstract
BackgroundGene expression profiling using high-throughput screening (HTS) technologies allows clinical researchers to find prognosis gene signatures that could better discriminate between different phenotypes and serve as potential biological markers in disease diagnoses. In recent years, many feature selection methods have been devised for finding such discriminative genes, and more recently information theoretic filters have also been introduced for capturing feature-to-class relevance and feature-to-feature correlations in microarray-based classification.MethodsIn this paper, we present and fully formulate a new multivariate filter, iRDA, for the discovery of HTS gene-expression candidate genes. The filter constitutes a four-step framework and includes feature relevance, feature redundancy, and feature interdependence in the context of feature-pairs. The method is based upon approximate Markov blankets, information theory, several heuristic search strategies with forward, backward and insertion phases, and the method is aiming at higher order gene interactions.ResultsTo show the strengths of iRDA, three performance measures, two evaluation schemes, two stability index sets, and the gene set enrichment analysis (GSEA) are all employed in our experimental studies. Its effectiveness has been validated by using seven well-known cancer gene-expression benchmarks and four other disease experiments, including a comparison to three popular information theoretic filters. In terms of classification performance, candidate genes selected by iRDA perform better than the sets discovered by the other three filters. Two stability measures indicate that iRDA is the most robust with the least variance. GSEA shows that iRDA produces more statistically enriched gene sets on five out of the six benchmark datasets.ConclusionsThrough the classification performance, the stability performance, and the enrichment analysis, iRDA is a promising filter to find predictive, stable, and enriched gene-expression candidate genes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2129-5) contains supplementary material, which is available to authorized users.
Highlights
Gene expression profiling using high-throughput screening (HTS) technologies allows clinical researchers to find prognosis gene signatures that could better discriminate between different phenotypes and serve as potential biological markers in disease diagnoses
Gene expression microarray experiments are often affected by noise that is caused by the experimental design of the underlying microarray technique, the stages of sample preparation, and the hybridisation processes of oligonucleotide probes [7]
The seven datasets have frequently been used to validate the performance of cancer classification and gene selection
Summary
Gene expression profiling using high-throughput screening (HTS) technologies allows clinical researchers to find prognosis gene signatures that could better discriminate between different phenotypes and serve as potential biological markers in disease diagnoses. Known to be an NP-complete problem [5], feature subset selection finds a subset of relevant features for the use of a model construction and looks into the minimal subset that optimises the best predictive model. This is based on the principle of parsimony [6], i.e., seeking a model that has as few as possible variables to fit the data sufficiently. The curse of dimensionality makes the process of selecting relevant genes even more challenging
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.