Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Haohan Wang,Eric P Xing,Bryon Aragam

doi:10.1109/bibm.2017.8217687

Abstract

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Abstract

Talk to us

Similar Papers

More From: Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

Lead the way for us

Journal: Proceedings. IEEE International Conference on Bioinformatics and Biomedicine	Publication Date: Nov 1, 2017
Citations: 58

Similar Papers

Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies.
Haohan Wang ... Bryon Aragam
Methods | VOL. 145
Haohan Wang, et. al.Haohan Wang ... Bryon Aragam
27 Apr 2018
Methods | VOL. 145

Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants
Daniel J M Crouch ... Walter F Bodmer
Proceedings of the National Academy of Sciences of the United States of America | VOL. 117
Daniel J M Crouch, et. al.Daniel J M Crouch ... Walter F Bodmer
04 Aug 2020
Proceedings of the National Academy of Sciences of the United States of America | VOL. 117

Genetic Studies: The Linear Mixed Models in Genome-wide Association Studies
Gengxin Li ... Hongjiang Zhu
The Open Bioinformatics Journal | VOL. 7
Gengxin Li, et. al.Gengxin Li ... Hongjiang Zhu
13 Dec 2013
The Open Bioinformatics Journal | VOL. 7

Genetic Association Studies: From “Searching Under the Lamppost” to “Fishing in the Pond”
Hashem B El-Serag ... Nandita Mitra
Gastroenterology | VOL. 134
Hashem B El-Serag, et. al.Hashem B El-Serag ... Nandita Mitra
01 Mar 2008
Gastroenterology | VOL. 134

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Abstract

Talk to us

Similar Papers

More From: Proceedings. IEEE International Conference on Bioinformatics and Biomedicine