A Feature Sampling Strategy for Analysis of High Dimensional Genomic Data.

Jie Zhang,Kai Zhang,Zhi Wei,Zhigen Zhao

doi:10.1109/tcbb.2017.2779492

Abstract

With the development of high throughput technology, it has become feasible and common to profile tens of thousands of gene activities simultaneously. These genomic data typically have sample size of hundreds or fewer, which is much less than the feature size (number of genes). In addition, the genes, in particular the ones from the same pathway, are often highly correlated. These issues impose a great challenge for selecting meaningful genes from a large number of (correlated) candidates in many genomic studies. Quite a few methods have been proposed to attack this challenge. Among them, regularization-based techniques, e.g., lasso, become much more appealing, because they can do model fitting and variable selection at the same time. However, the lasso regression has its known limitations. One is that the number of genes selected by the lasso couldn't exceed the number of samples. Another limitation is that, if causal genes are highly correlated, the lasso tends to select only one or few genes from them. Biologists, however, desire to identify them all. To overcome these limitations, we present here a novel, robust, and stable variable selection method. Through simulation studies and a real application to the transcriptome data, we demonstrate the superiority of the proposed method in selecting highly correlated causal genes. We also provide some theoretical justifications for this feature sampling strategy based on the mean and variance analyses.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Feature Sampling Strategy for Analysis of High Dimensional Genomic Data.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics

Lead the way for us

Journal: IEEE/ACM transactions on computational biology and bioinformatics	Publication Date: Dec 4, 2017
Citations: 5

Similar Papers

New variable selection strategy for analysis of high-dimensional DNA methylation data.
Jiyun Choi ... Kipoong Kim
Journal of bioinformatics and computational biology | VOL. 16
Jiyun Choi, et. al.Jiyun Choi ... Kipoong Kim
01 Aug 2018
Journal of bioinformatics and computational biology | VOL. 16

Preface
S Ejaz Ahmed
Applied Stochastic Models in Business and Industry | VOL. 35
S Ejaz AhmedS Ejaz Ahmed
01 Mar 2019
Applied Stochastic Models in Business and Industry | VOL. 35

Analysis of high-dimensional genomic data employing a novel bio-inspired algorithm
Santos Kumar Baliarsingh ... Sambit Bakshi
Applied Soft Computing | VOL. 77
Santos Kumar Baliarsingh, et. al.Santos Kumar Baliarsingh ... Sambit Bakshi
23 Jan 2019
Applied Soft Computing | VOL. 77

Scalable Pathogen Pipeline Platform (SP^3): Enabling Unified Genomic Data Analysis with Elastic Cloud Computing
Fan Yang-Turner ... Tim Peto
-
Fan Yang-Turner, et. al.Fan Yang-Turner ... Tim Peto
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Feature Sampling Strategy for Analysis of High Dimensional Genomic Data.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics