Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data.

Samarendra Das,Shesh N Rai

doi:10.3390/e22111205

Abstract

Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.

Highlights

The emergence of high-throughput sequencing technologies exponentially increase the size of output data in genome sciences with respect to a number of features [1]
The distributions of weights computed from Support Vector Machine (SVM)-Maximum Relevance and Minimum Redundancy (MRMR) method [13] and adj. p-values for genes computed from the proposed BSM approach for abiotic and biotic stresses in rice are shown in Figure 2 and Figure S3, respectively
From the distribution of adj. p-values computed through the proposed approach, it was observed that the relevant genes were well separated from the irrelevant genes, and a small number of genes found to be statistically significant (Figure 2(A1))

Summary

Introduction

The emergence of high-throughput sequencing technologies exponentially increase the size of output data in genome sciences with respect to a number of features [1]. GEO, ArrayExpress, etc., over the years by researchers across the globe [4,5] These publicly available high-throughput data need to be analyzed in order to gain valid biological insights. One such aspect of this research is to select genes, which are highly relevant to the phenotype/trait under study, out of several thousands of genes in the data. This is called feature selection in machine learning in general and gene selection in genomics [5,6,7]. The selected genes can be used as predictors for other predictive analysis, i.e., subjects classification [7,8,11], gene regulation modeling [14], gene network analysis [5,6], etc., which enhances the stability, power and feasibility of the developed models [15]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy (Basel, Switzerland)	Publication Date: Oct 25, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

Statistical approach for selection of biologically informative genes
Samarendra Das ... Shesh N Rai
Gene | VOL. 655
Samarendra Das, et. al.Samarendra Das ... Shesh N Rai
16 Feb 2018
Gene | VOL. 655

Statistical approaches of gene set analysis with quantitative trait loci for high-throughput genomic studies.
Samarendra Das
-
Samarendra DasSamarendra Das
30 Nov 2021
30 Nov 2021

Weighted L1-norm Logistic Regression for Gene Selection of Microarray Gene Expression Classification
Aiedh Mrisi Alharthi ... Muhammad Hisyam Lee
International Journal on Advanced Science, Engineering and Information Technology | VOL. 10
Aiedh Mrisi Alharthi, et. al.Aiedh Mrisi Alharthi ... Muhammad Hisyam Lee
12 Aug 2020
International Journal on Advanced Science, Engineering and Information Technology | VOL. 10

Gene selection and tumor identification based on a hybrid of the multi-filter embedded recursive mountain gazelle algorithm
Sarah Osama ... Hassan Shaban
Computers in Biology and Medicine | VOL. 167
Sarah Osama, et. al.Sarah Osama ... Hassan Shaban
08 Nov 2023
Computers in Biology and Medicine | VOL. 167

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)