Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

Jim C Huang,David Heckerman,Christopher Meek,Carl Kadie

doi:10.1371/journal.pone.0021591

Jim C Huang, David Heckerman + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0021591

Copy DOI

Journal: PLoS ONE	Publication Date: Jul 12, 2011
Citations: 20	License type: CC BY 4.0

Affiliation: Microsoft (United States)

Abstract

Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.

Highlights

Population structure, family structure and/or cryptic relatedness are well-known confounding factors that cause spurious associations to be found in genome-wide association studies (GWAS) [1,2,3,4,5,6]
We have presented a novel GWAS method that accounts for confounding factors such as population structure, family structure or cryptic relatedness
Similar to linear mixed-effect models (LMMs) and principal components analysis (PCA)-based methods for association, our model accounts for confounding factors through the use of pairwise similarities between patients, which allows us to significantly reduce false positive rates when performing associations

Summary

Introduction

Population structure, family structure and/or cryptic relatedness are well-known confounding factors that cause spurious associations to be found in GWAS [1,2,3,4,5,6]. Other methods have been proposed that use a principal components analysis of individuals’ SNPs [4], perform a post-hoc correction of test statistics such as Genomic Control [2], or cluster individuals before performing an aggregate association between clusters and phenotypes [11] These methods, while accounting for confounding factors under different assumptions, have been shown to either suffer from insufficient statistical power when the confounding effects are strong [4,5] or are unable to fully capture their effects altogether, such that many false positives are produced [3,5,12]. In several recent studies [3,5,12,13], methods based on LMMs were found to produce fewer false positives and had higher statistical power as compared to other methods for modeling confounding factors, making LMMs a popular class of GWAS methods that have high statistical power and low false positive rates

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

Identifying Cancer Genetic Markers of Susceptibility Using High-Throughput SNP Arrays
-
Cancer Biology & Therapy | VOL. 6
--
30 May 2007
Cancer Biology & Therapy | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE