Abstract

Many studies leverage targeted whole-genome sequencing (WGS) experiments to identify rare and causal variants within populations. As a natural consequence of their experimental design, many of these surveys tend to sequence redundant haplotype segments due to their high frequency in the base population, and the variants discovered within sequencing data are difficult to phase. We propose a new algorithm, called inverse weight selection (IWS), that preferentially selects individuals based on the cumulative presence of rare frequency haplotypes to maximize the efficiency of WGS surveys. To test the efficacy of this method, we used genotype data from 112,113 registered Holstein bulls derived from the US national dairy database. We demonstrate that IWS is at least 6.8% more efficient than previously published methods in selecting the least number of individuals required to sequence all haplotype segments ≥4% frequency in the US Holstein population. We also suggest that future surveys focus on sequencing homozygous haplotype segments as a first pass to achieve a 50% reduction in cost with an added benefit of phasing variant calls efficiently. Together, this new selection algorithm and experimental design suggestion significantly reduce the overall cost of variant discovery through WGS experiments, making surveys for causal variants influencing disease and production even more efficient.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call