Abstract

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample’s birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.

Highlights

  • Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses

  • The GERMLINE algorithm can become inefficient in regions where certain short haplotypes can be extremely common in the population, which results in hash collisions across a large fraction of samples, effectively reverting back to a most-pairs analysis and monopolizing computation time

  • fast sequentially Markovian coalescent (FastSMC) produces a list of pairwise IBD segments with each segment associated to an IBD quality score - i.e the average probability of the to most recent common ancestor (TMRCA) being between present time and the user-specified time threshold – and an age estimate – i.e. the average maximum a posteriori (MAP) TMRCA along the segment

Read more

Summary

Introduction

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. FastSMC quantifies uncertainty and estimates the time to most recent common ancestor (TMRCA) for individuals that share IBD segments It does so by efficiently leveraging information provided by allele sharing, genotype frequencies, and demographic history, which results in a costeffective boost in accuracy. Leveraging the speed and accuracy of FastSMC, we analyze IBD sharing in 487, 409 phased individuals from the UK Biobank dataset, identifying and characterizing ~214 billion IBD segments transmitted by shared ancestors within the past 50 generations This network of shared ancestry enables us to reconstruct a finegrained picture of time-dependent genomic relatedness in the UK. Leveraging this correlation, we detect 20 associations to genomic loci harboring loss-of-function (LoF) variants with seven blood-related phenotypes These results underscore the importance of modeling distant relatedness to reveal subtle population structure, recent evolutionary history, and rare pathogenic variation

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.