Joint screening of ultrahigh dimensional variables for family-based genetic studies

Subha Datta,Yixin Fang,Ji Meng Loh

doi:10.1186/s12919-018-0120-2

Subha Datta, Yixin Fang + Show 1 more

Open Access

https://doi.org/10.1186/s12919-018-0120-2

Copy DOI

Journal: BMC Proceedings	Publication Date: Sep 1, 2018
Citations: 1	License type: open-access

Affiliation: New Jersey Institute of Technology

Abstract

BackgroundMixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model.MethodsWe propose a two-stage strategy, screening genetic variables in the first stage and then fitting the mixed effect model in the second stage to those variables that survive the screening. For the screening stage, we can use the sure independence screening (SIS) procedure, which fits the mixed effect model to one genetic variable at a time. Because the SIS procedure may fail to identify those marginally unimportant but jointly important genetic variables, we propose a joint screening (JS) procedure that screens all the genetic variables simultaneously. We evaluate the performance of the proposed JS procedure via a simulation study and an application to the GAW20 data.ResultsWe perform the proposed JS procedure on the GAW20 representative simulated data set (n = 680 participant(s) and p = 463,995 CpG cytosine-phosphate-guanine [CpG] sites) and select the top d = ⌊n/ log(n)⌋ variables. Then we fit the mixed model using these top variables. Under significance level, 5%, 43 CpG sites are found to be significant. Some diagnostic analyses based on the residuals show the fitted mixed model is appropriate.ConclusionsAlthough the GAW20 data set is ultrahigh dimensional and family-based having within group variances, we were successful in performing subset selection using a two-step strategy that is computationally simple and easy to understand.

Highlights

Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients
We compute the joint screening (JS) estimate (9), using the GAW20 representative simulated data set with n = 680 observations and p = 463,995 cytosine-phosphate-guanine dinucleotide (CpG) sites
Results from stage 2 We perform mixed model analysis (1), using the GAW20 representative simulated data set with n = 680 observations and d = 104 selected genetic variables plus other important risk factors, namely, age, gender, smoking, and metabolic syndrome

Summary

Introduction

Mixed models are a useful tool for evaluating the association between an outcome variable and genetic variables from a family-based genetic study, taking into account the kinship coefficients. When there are ultrahigh dimensional genetic variables (ie, p ≫ n), it is challenging to fit any mixed effect model. Compared with genome-wide DNA sequence variance investigation of blood lipids, genome-wide epigenetic investigation has been far less explored. To fill this gap, the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study conducted an epigenome-wide association study to uncover epigenetic factors influencing lipid metabolism [1]. The number of genetic variables is ultrahigh. The pregenomethate values are measured at visits 1 and 2, and the postgenomethate values are measured at visits 3 and 4

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint screening of ultrahigh dimensional variables for family-based genetic studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Proceedings

Lead the way for us

Similar Papers

Feature Screening via Distance Correlation Learning
Runze Li ... Liping Zhu
Journal of the American Statistical Association | VOL. 107
Runze Li, et. al.Runze Li ... Liping Zhu
01 Jun 2012
Journal of the American Statistical Association | VOL. 107

High‐dimensional variable screening under multicollinearity
Naifei Zhao ... Hong Wang
Stat | VOL. 9
Naifei Zhao, et. al.Naifei Zhao ... Hong Wang
01 Jan 2020
Stat | VOL. 9

Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
Faming Liang ... Kai Yu
Journal of the American Statistical Association | VOL. 108
Faming Liang, et. al.Faming Liang ... Kai Yu
01 Jun 2013
Journal of the American Statistical Association | VOL. 108

A Combined Feature Screening Approach of Random Forest and Filterbased Methods for Ultra-high Dimensional Data
Lifeng Zhou ... Hong Wang
Current Bioinformatics | VOL. 17
Lifeng Zhou, et. al.Lifeng Zhou ... Hong Wang
01 May 2022
Current Bioinformatics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint screening of ultrahigh dimensional variables for family-based genetic studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Proceedings