Abstract

A major use of genetic data is parentage verification and identification as inaccurate pedigrees negatively affect genetic gain. Since 2012 the international standard for single nucleotide polymorphism (SNP) verification in Bos taurus cattle has been the ISAG SNP panels. While these ISAG panels provide an increased level of parentage accuracy over microsatellite markers (MS), they can validate the wrong parent at ≤1% misconcordance rate levels, indicating that more SNP are needed if a more accurate pedigree is required. With rapidly increasing numbers of cattle being genotyped in Ireland that represent 61 B. taurus breeds from a wide range of farm types: beef/dairy, AI/pedigree/commercial, purebred/crossbred, and large to small herd size the Irish Cattle Breeding Federation (ICBF) analyzed different SNP densities to determine that at a minimum ≥500 SNP are needed to consistently predict only one set of parents at a ≤1% misconcordance rate. For parentage validation and prediction ICBF uses 800 SNP (ICBF800) selected based on SNP clustering quality, ISAG200 inclusion, call rate (CR), and minor allele frequency (MAF) in the Irish cattle population. Large datasets require sample and SNP quality control (QC). Most publications only deal with SNP QC via CR, MAF, parent-progeny conflicts, and Hardy-Weinberg deviation, but not sample QC. We report here parentage, SNP QC, and a genomic sample QC pipelines to deal with the unique challenges of >1 million genotypes from a national herd such as SNP genotype errors from mis-tagging of animals, lab errors, farm errors, and multiple other issues that can arise. We divide the pipeline into two parts: a Genotype QC and an Animal QC pipeline. The Genotype QC identifies samples with low call rate, missing or mixed genotype classes (no BB genotype or ABTG alleles present), and low genotype frequencies. The Animal QC handles situations where the genotype might not belong to the listed individual by identifying: >1 non-matching genotypes per animal, SNP duplicates, sex and breed prediction mismatches, parentage and progeny validation results, and other situations. The Animal QC pipeline make use of ICBF800 SNP set where appropriate to identify errors in a computationally efficient yet still highly accurate method.

Highlights

  • Since the 1960’s bovine pedigree verification has been performed with various DNA technology, initially performed with blood groups (Stormont, 1967), microsatellite markers (MS) (Davis and Denise, 1998), and transitioning to single nucleotide polymorphisms (SNP) (Heaton et al, 2002)

  • While the PE values for the International Society of Animal Genetic (ISAG) SNP panels appear sufficient for accurate parentage, Vandeputte (2012) notes that many reported PE values are overly optimistic, that increasing numbers of markers are needed to maintain the same PE value as the population size increases, and a marker set with a high PE value can still have a low probability of complete exclusion of all false parentage with large

  • While there currently is no international standard for which or how many SNP to use for parentage outside of the ISAG set, we argue that a larger SNP set should be used such as the 800 SNP set (ICBF800) that Irish Cattle Breeding Federation (ICBF) has developed and uses for parentage validation and prediction (McClure et al, 2015)

Read more

Summary

Introduction

Since the 1960’s bovine pedigree verification has been performed with various DNA technology, initially performed with blood groups (Stormont, 1967), microsatellite markers (MS) (Davis and Denise, 1998), and transitioning to single nucleotide polymorphisms (SNP) (Heaton et al, 2002). While with all technology there is a need to balance cost with performance; for parentage validation the question has typically been how many markers are needed to obtain a high probability, but not necessarily 100%, that the reported parentage is correct. While the PE values for the ISAG SNP panels appear sufficient for accurate parentage, Vandeputte (2012) notes that many reported PE values are overly optimistic, that increasing numbers of markers are needed to maintain the same PE value as the population size increases, and a marker set with a high PE value can still have a low probability of complete exclusion of all false parentage with large

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.