Abstract

Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.

Highlights

  • Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy

  • Motivated by the substantial need to identify relatives in modern samples, we present an evaluation of 12 state-of-theart pairwise relatedness methods, each capable of scaling to analyze thousands of individuals, including seven that directly infer genome-wide relatedness measures (Manichaikul et al 2010; Thornton et al 2012; Li et al 2014; Moltke and Albrechtsen 2014; Sun and Dimitromanolakis 2014; Chang et al 2015; Conomos et al 2016) and five identical by descent (IBD) segment detection methods (Gusev et al 2009; Browning and Browning 2011a, 2013a,b; Durand et al 2014) that we used to infer these quantities

  • We used SNP array genotypes from Mexican American individuals contained in large pedigrees from the San Antonio Mexican American Family Studies (SAMAFS) (Mitchell et al 1996; Duggirala et al 1999; Hunt et al 2005)

Read more

Summary

Introduction

Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. The IBDseq method, while performing well for inferring first- through seventh-degree relatives, infers a much larger fraction of pairs of individuals as related that are reported as unrelated, suggesting it may be biased toward detecting higher levels of IBD sharing than the other methods.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call