Abstract

Epidemiologic collections have been a major resource for genotype–phenotype studies of complex disease given their large sample size, racial/ethnic diversity, and breadth and depth of phenotypes, traits, and exposures. A major disadvantage of these collections is they often survey households and communities without collecting extensive pedigree data. Failure to account for substantial relatedness can lead to inflated estimates and spurious associations. To examine the extent of cryptic relatedness in an epidemiologic collection, we as the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study accessed the National Health and Nutrition Examination Surveys (NHANES) linked to DNA samples (“Genetic NHANES”) from NHANES III and NHANES 1999–2002. NHANES are population-based cross-sectional surveys conducted by the National Center for Health Statistics at the Centers for Disease Control and Prevention. Genome-wide genetic data is not yet available in NHANES, and current data use agreements prohibit the generation of GWAS-level data in NHANES samples due issues in maintaining confidentiality among other ethical concerns. To date, only hundreds of single nucleotide polymorphisms (SNPs) genotyped in a variety of candidate genes are available for analysis in NHANES. We performed identity-by-descent (IBD) estimates in three self-identified subpopulations of Genetic NHANES (non-Hispanic white, non- Hispanic black, and Mexican American) using PLINK software to identify potential familial relationships from presumed unrelated subjects. We then compared the PLINKidentified relationships to those identified by an alternative method implemented in Kinship-based INference for Genome-wide association studies (KING). Overall, both methods identified familial relationships in NHANES III and NHANES 1999–2002 for all three subpopulations, but little concordance was observed between the two methods due in major part to the limited SNP data available in Genetic NHANES. Despite the lack of genome-wide data, our results suggest the presence of cryptic relatedness in this epidemiologic collection and highlight the limitations of restricted datasets such as NHANES in the context of modern day genetic epidemiology studies.

Highlights

  • Epidemiologic cohorts are a valuable resource for genotype– phenotype studies given their large sample size, racial/ethnic diversity, and wealth of phenotypes, traits, and exposures

  • We identified cryptic relatedness in these large cross-sectional surveys using both of these methods and call attention to the potential for hidden familial relationships in epidemiologic cohorts accessed for genetic association studies

  • Kinship coefficient ranges for parent/child and full sibling in Kinshipbased INference for Genome-wide association studies (KING) are equal as there is no further discrimination of familial status for first degree relationships using the kinship coefficient with that software

Read more

Summary

Introduction

Epidemiologic cohorts are a valuable resource for genotype– phenotype studies given their large sample size, racial/ethnic diversity, and wealth of phenotypes, traits, and exposures. Some epidemiologic cohorts seek related individuals during the ascertainment process in order to study the heritability of certain traits in a similar genetic background or minimize certain environmental differences between individuals; for example, the Framingham Heart Study recruited participants from a Massachusetts town and subsequently enrolled their offspring in later phases of the study to identify factors that contribute to cardiovascular disease (Splansky et al, 2007), whereas the Marshfield Clinic Personalized Medicine Research Project participants are relatively ethnically homogenous and come from the Marshfield, Wisconsin area (McCarty et al, 2005) Others, such as the National Health and Nutrition Examination Surveys (NHANES), use an ascertainment process where multiple participants from a single household may be included without documentation of the relationship between those participants (Ezzati et al, 1992)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call