Abstract

Genome-wide association studies (GWASs) have been widely used to map loci contributing to variation in complex traits and risk of diseases in humans. Accurate specification of familial relationships is crucial for family-based GWAS, as well as in population-based GWAS with unknown (or unrecognized) family structure. The family structure in a GWAS should be routinely investigated using the SNP data prior to the analysis of population structure or phenotype. Existing algorithms for relationship inference have a major weakness of estimating allele frequencies at each SNP from the entire sample, under a strong assumption of homogeneous population structure. This assumption is often untenable. Here, we present a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). We present simulation experiments to demonstrate that the algorithm has sufficient power to provide reliable inference on millions of unrelated pairs and thousands of relative pairs (up to 3rd-degree relationships). Application of our robust algorithm to HapMap and GWAS datasets demonstrates that it performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. Our extremely efficient implementation performs relationship inference on millions of pairs of individuals in a matter of minutes, dozens of times faster than the most efficient existing algorithm known to us. Our robust relationship inference algorithm is implemented in a freely available software package, KING, available for download at http://people.virginia.edu/∼wc9c/KING.

Highlights

  • Genome-wide association studies (GWASs) have been widely used to identify common variants that contribute to variation in complex human phenotypes and diseases

  • We present a novel framework for relationship inference, Kinshipbased INference for Genome-wide association studies (KING), together with a rapid algorithm for relationship inference appropriate for use on samples with thousands of individuals genotyped at millions of SNPs from autosomes, consistent with a scale typically achieved in a GWAS

  • We first examined the distribution of actual or realized IBD sharing between relative pairs that is defined as half of the actual proportion of the genome that is shared IBD between the pair of relatives The actual IBD sharing between a pair of relatives varies around its expectation except parent–offspring and monozygotic twin pairs (Visscher et al, 2008)

Read more

Summary

Introduction

Genome-wide association studies (GWASs) have been widely used to identify common variants that contribute to variation in complex human phenotypes and diseases. High-throughput genotyping performed in a GWAS presents new opportunities for pedigree error detection using millions of SNPs to assess the degree of relationship between a pair of individuals. With these opportunities come the challenges of accounting for linkage disequilibrium among typed markers, while managing computational resources to analyze the large amount of genotype data. If certain pairs of individuals do not cluster—either due to limitations in sample size or due to the different underlying allele frequencies between different pairs (e.g. in the presence of population structure)—GRR fails to detect the pedigree errors. The identicalby-descent (IBD) statistics between each pair of individuals are estimated using the average of IBS and the estimation of samplelevel allele frequencies at each SNP according to Hardy–Weinberg Equilibrium (HWE) assumptions

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call