Comparison of identity by descent estimates with Plink and refinedIBD in dogs

Gábor Mészáros

doi:10.15414/afz.2020.23.04.213-216

Abstract

Article Details: Received: 2020-05-26 | Accepted: 2020-07-10 | Available online: 2020-12-31 https://doi.org/10.15414/afz.2020.23.04.213-216 With the availability of dense SNP genotype data various types of estimation methods were developed to estimate relatedness of any two individuals, even in absence of traditional pedigrees. One of the most prominent method was the identity by descent (IBD), widely used in genetic diversity studies. IBD itself could be estimate using different approaches and software that might provide different results. The purpose of this study was to compare the estimates from two established software, probabilistic approach by Plink and a non-probabilistic approach based on haplotypes by refinedIBD. High density SNP genotypes from 98 Leonberger dogs were used to estimate IBD coefficients based on two data types: with one of the SNP markers in high linkage disequilibrium removed, as required by Plink, and SNP markers subjected only to standard quality control, as required by refinedIBD. The Pearson correlation coefficients from pairwise estimates were 0.97 when estimated with the same software and 0.84 between the two software and data types, as required by the respective user manuals. The numerical differences were clustered around zero (i.e. no to little difference) for half of the pairwise comparisons, and up to ±0.1 for the vast majority of cases. The most extreme differences were consistently estimated higher by Plink. Because of these differences a follow up investigation should be done, including pedigrees, as well as simulated data to provide a comprehensive analysis. Keywords: SNP, Plink, refinedIBD, Leonberger, companion animals References BROWNING, B.L. and BROWNING, S.R. (2013). Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data. Genetics, 194(2), 459–471. https://doi.org/10.1534/genetics.113.150029 CHANG, C.C., CHOW, C.C., TELLIER, L.C., VATTIKUTI, S., PURCELL, S.M. and LEE, J.J. (2015). Second-Generation PLINK: Rising to the Challenge of Larger and Richer Datasets. GigaScience, (4) 7. https://doi.org/10.1186/s13742-015-0047-8 NASERI, A., LIU, X., TANG, K., ZHANG, S. and ZHI, D. (2019). RaPID: Ultra-Fast, Powerful, and Accurate Detection of Segments Identical by Descent (IBD) in Biobank-Scale Cohorts. Genome Biology, 20(1), 143. https://doi.org/10.1186/s13059-019-1754-8 R CORE TEAM (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ SPEED, D. and BALDING, D.J. (2015). Relatedness in the Post-Genomic Era: Is It Still Useful? Nature Reviews Genetics, 16(1), 33–44. https://doi.org/10.1038/nrg3821 TAYLOR, A.R., JACOB, P.E., NEAFSEY, D.E. and BUCKEE, C.O. (2019). Estimating Relatedness Between Malaria Parasites. Genetics, 212(4), 1337–1351. https://doi.org/10.1534/genetics.119.302120 WEIR, B.S., ANDERSON, A.D. and HEPLER, A.B. (2006). Genetic Relatedness Analysis: Modern Data and New Challenges. Nature Reviews Genetics, 7(10), 771–780. https://doi.org/10.1038/nrg1960 WRIGHT, S. (1922). Coefficients of Inbreeding and Relationship. The American Naturalist, 56(645), 330–338.

Highlights

Individuals from the same family or the same population are related to each other due to shared ancestry (Weir et al, 2006)
The purpose of this study was to compare the estimates from two established software, probabilistic approach by Plink and a non-probabilistic approach based on haplotypes by refinedIBD
High density SNP genotypes from 98 Leonberger dogs were used to estimate identity by descent (IBD) coefficients based on two data types: with one of the SNP markers in high linkage disequilibrium removed, as required by Plink, and SNP markers subjected only to standard quality control, as required by refinedIBD

Summary

Introduction

Individuals from the same family or the same population are related to each other due to shared ancestry (Weir et al, 2006). There are many different ways to measure genetic similarity between individuals, as reviewed in Speed and Balding (2015). We focus on the identity by descent (IBD), a relatedness measure that could be estimated with genetic markers, given the probabilities that two individuals share zero, one or two alleles at a locus (Weir et al, 2006). The numerical values of IBD range from zero for unrelated individuals to one for identical twins or clones, and in absence of inbreeding is broken down by recombination (Wright, 1922). The unique strength of IBD compared to other population genetics measures is in the efficiency to track distant relatives, when the IBD genome fragments are lost at an exponential rate per meiosis, while the decrease of their length is only linear to the reciprocal of the number of meiosis (Naseri et al, 2019)

Objectives

Methods

Results

Conclusion