Abstract

In this paper, we evaluate using genotype‐by‐sequencing (GBS) data to perform parentage assignment in lieu of traditional array data. The use of GBS data raises two issues: First, for low‐coverage (e.g., <2×) GBS data, it may not be possible to call the genotype at many loci, a critical first step for detecting opposing homozygous markers. Second, the amount of sequencing coverage may vary across individuals, making it challenging to directly compare the likelihood scores between putative parents. To address these issues, we extend the probabilistic framework of Huisman (Molecular Ecology Resources, 2017, 17, 1009) and evaluate putative parents by comparing their (potentially noisy) genotypes to a series of proposal distributions. These distributions describe the expected genotype probabilities for the relatives of an individual. We assign putative parents as a parent if they are classified as a parent (as opposed to e.g., an unrelated individual), and if the assignment score passes a threshold. We evaluated this method on simulated data and found that (a) high‐coverage (>2×) GBS data performs similarly to array data and requires only a small number of markers to correctly assign parents and (b) low‐coverage GBS data (as low as 0.1×) can also be used, provided that it is obtained across a large number of markers. When analysing the low‐coverage GBS data, we also found a high number of false positives if the true parent is not contained within the list of candidate parents, but that this false positive rate can be greatly reduced by hand tuning the assignment threshold. We provide this parentage assignment method as a standalone program called AlphaAssign.

Highlights

  • In this paper, we evaluate the performance of using genotype‐ by‐sequence (GBS) data to perform parentage assignment in commercial plant and animal breeding settings

  • In the remaining scenarios we focused on the case where both parents and progeny had GBS data and analysed (b) the impact of knowing and genotyping the known alternative parent, (c) the impact of restricting the pool of putative parents to either 100 unrelated individuals, 45 half sibs, or the four full sibs, and (d) examined how the false positive rate changed depending on the threshold used for assignment

  • If the true parent was excluded from the list of putative parents, we found that the false positive rate was less than 0.2% in all cases

Read more

Summary

Introduction

We evaluate the performance of using genotype‐ by‐sequence (GBS) data to perform parentage assignment in commercial plant and animal breeding settings. When the parents of an individual are not recorded, parentage assignment algorithms can use genetic data to reconstruct parent–child relationships. Much of the previous work on parentage assignment has focused on the case where the genetic data were generated from microsatellite markers or more recently from SNP arrays (Fisher, Malthus, Walker, Corbett, & Spelman, 2009; Riester, Stadler, & Klemm, 2009; Tokarska et al, 2009). In the case of SNP arrays, between 50 and 700 markers are required to accurately assign parents and rule out false assignments (Fisher et al, 2009; Strucken et al, 2016; Tortereau, Moreno, Tosser‐ Klopp, Servin, & Raoul, 2017).

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call