A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.

Sara Lindström,Peter Kraft,Andrew T Chan,Hongyan Huang,Rulla M Tamimi,Liming Liang,David J Hunter,Stephanie Loomis,Immaculata De Vivo,Meir J Stampfer,Jinyan Huang,A Heather Eliassen,Christopher Kabrhel,Jae H Kang,Constance Turman,Susan E Hankinson,Louis R Pasquale,Hugues Aschard,Michael Gaziano,Shelley S Tworoger,Janey L Wiggs,Gary C Curhan ,Hyon K Choi ,Majken K Jensen ,Eric B Rimm ,Marilyn C Cornelis ,Frank B Hu ,Charles S Fuchs

doi:10.1371/journal.pone.0173997

Abstract

The Nurses’ Health Study (NHS), Nurses’ Health Study II (NHSII), Health Professionals Follow Up Study (HPFS) and the Physicians Health Study (PHS) have collected detailed longitudinal data on multiple exposures and traits for approximately 310,000 study participants over the last 35 years. Over 160,000 study participants across the cohorts have donated a DNA sample and to date, 20,691 subjects have been genotyped as part of genome-wide association studies (GWAS) of twelve primary outcomes. However, these studies utilized six different GWAS arrays making it difficult to conduct analyses of secondary phenotypes or share controls across studies. To allow for secondary analyses of these data, we have created three new datasets merged by platform family and performed imputation using a common reference panel, the 1,000 Genomes Phase I release. Here, we describe the methodology behind the data merging and imputation and present imputation quality statistics and association results from two GWAS of secondary phenotypes (body mass index (BMI) and venous thromboembolism (VTE)). We observed the strongest BMI association for the FTO SNP rs55872725 (β = 0.45, p = 3.48x10-22), and using a significance level of p = 0.05, we replicated 19 out of 32 known BMI SNPs. For VTE, we observed the strongest association for the rs2040445 SNP (OR = 2.17, 95% CI: 1.79–2.63, p = 2.70x10-15), located downstream of F5 and also observed significant associations for the known ABO and F11 regions. This pooled resource can be used to maximize power in GWAS of phenotypes collected across the cohorts and for studying gene-environment interactions as well as rare phenotypes and genotypes.

Highlights

Large, well-phenotyped cohort studies have constituted the backbone of epidemiology for several decades
Collected longitudinal information on exposures and outcomes enables a broad spectrum of analyses and has led to novel insights into disease etiology, such as the link between smoking and lung cancer [1,2] as well as the link between both high cholesterol levels and trans fatty acids with coronary heart disease [3,4] Many existing cohorts collect biological specimens from their participants, allowing for studies of inherited genetic variation as well as prospectively measured biomarkers such as metabolomic profiles [5] and circulating hormone levels [6]
The average imputation quality score by minor frequency for each platform family is shown in Fig 2 and the distribution of imputation quality score for rare (MAF 0.01) variants is shown in S1 Table

Summary

Introduction

Well-phenotyped cohort studies have constituted the backbone of epidemiology for several decades. Genome-wide association studies (GWAS) are currently a main engine of genetic epidemiology and have led to the identification of thousands of loci for hundreds of traits (for an overview and its clinical applications, see Manolio [7]). When designing a GWAS, cost is still the determining factor and GWAS within cohorts are often conducted within nested case-control studies or sub-cohorts. The Women’s Genome Health Study (WGHS) [8] genotyped the entire cohort of 27,000 women and the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort has generated GWAS data on almost 100,000 individuals [9]. In many instances, GWAS are tied to specific funding sources acquired for studying a pre-defined outcome and only a small fraction of the cohort is genotyped at a specific time

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Mar 16, 2017
Citations: 62	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Replication of Associations between GWAS SNPs and Melanoma Risk in the Population Architecture Using Genomics and Epidemiology (PAGE) Study
...
Journal of Investigative Dermatology | VOL. 134
, et. al. ...
01 Jul 2014
Journal of Investigative Dermatology | VOL. 134

Abstract P270: Body Size Throughout the Life Course and Risk of Venous Thromboembolism
Kaitlin A Hagan ... Jihye Kim
Circulation | VOL. 137
Kaitlin A Hagan, et. al.Kaitlin A Hagan ... Jihye Kim
20 Mar 2018
Circulation | VOL. 137

Are exposure-disease relationships assessed in cohorts of health professionals generalizable?: a comparative analysis based on WCRF/AICR systematic literature reviews.
Peilu Wang ... Edward L Giovannucci
Cancer causes & control : CCC | VOL. 34
Peilu Wang, et. al.Peilu Wang ... Edward L Giovannucci
05 Oct 2022
Cancer causes & control : CCC | VOL. 34

C-reactive Protein and Risk of OSA in Four US Cohorts
Tianyi Huang ... Susan Redline
Chest | VOL. 159
Tianyi Huang, et. al.Tianyi Huang ... Susan Redline
30 Jan 2021
Chest | VOL. 159

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comprehensive survey of genetic variation in 20,691 subjects from four large cohorts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one