Abstract

Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.

Highlights

  • Identity by descent (IBD) is a fundamental concept in genetics, pertaining to the genetic similarities among individuals who coinherited an allele from a common ancestor

  • Experimental Setup To evaluate the performance of SpeeDB, we used data from 255 individuals from Asian populations of the HapMap Phase III panel [14], and from 1,480 individuals from the 1958 Birth Cohort of the Wellcome Trust Case Control Consortium (WTCCC) [15]

  • We report the sensitivity as the fraction of IBD segments that pass SpeeDB’s filters and the speedup resulting from filtering out large portions of the database

Read more

Summary

Introduction

Identity by descent (IBD) is a fundamental concept in genetics, pertaining to the genetic similarities among individuals who coinherited an allele from a common ancestor. While genomic sequence variants, such as single-nucleotide variations, insertions, deletions, and copy-number variations, are constantly being introduced with every generation, genomic IBD segments have a high probability of being identical due to the relatively low denovo mutation rates [1,2]. We call a sequence of alleles IBD if the alleles have been inherited from a common ancestor without recombination events. Because relatively few recombination events occur in each generation (roughly one per chromosome), the two likely-related individuals co-inherit a genomic segment decreases rapidly with every generation, while IBD segments typically stretch to significant lengths. In the case of two relatives that share a relative g generations ago, a genomic segment that originated from the common ancestors needs to be transmitted over 2g meioses, which corresponds to a probability of 21{2g.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call