Abstract

Multiple methods to detect copy number variants (CNV) relying on different types of data have been developed and CNV have been shown to have an impact on phenotypes of numerous traits of economic importance in cattle, such as reproduction and immunity. Further improvements in CNV detection are still needed in regard to the trade-off between high-true and low-false positive variant identification rates. Instead of improving single CNV detection methods, variants can be identified in silico with high confidence when multiple methods and datasets are combined. Here, CNV were identified from whole-genome sequences (WGS) and genotype array (GEN) data on 96 Holstein animals. After CNV detection, two sets of high confidence CNV regions (CNVR) were created that contained variants found in both WGS and GEN data following an animal-based (n = 52) and a population-based (n = 36) pipeline. Furthermore, the change in false positive CNV identification rates using different GEN marker densities was evaluated. The population-based approach characterized CNVR, which were more often shared among animals (average 40% more samples per CNVR) and were more often linked to putative functions (48 vs 56% of CNVR) than CNV identified with the animal-based approach. Moreover, false positive identification rates up to 22% were estimated on GEN information. Further research using larger datasets should use a population-wide approach to identify high confidence CNVR.

Highlights

  • Dairy cattle genetics has made great advances since the effects of single nucleotide polymorphisms (SNP) have been recognized on a wide range of mono or polygenic traits economically important for the dairy industry[1,2,3,4,5]

  • To fairly compare Copy number variants (CNV) identified from these two sources of information, it is of importance to choose a single sample whole-genome sequences (WGS) identification method that relies on the read depth of the sequences

  • Holstein animals were aligned to the bovine reference genome ARS-UCD1.240

Read more

Summary

Introduction

Dairy cattle genetics has made great advances since the effects of single nucleotide polymorphisms (SNP) have been recognized on a wide range of mono or polygenic traits economically important for the dairy industry[1,2,3,4,5]. CNV have been identified in cattle from SNP genotyping arrays (GEN)[15,16,17,18,19,20,21], hybridization arrays (ACGH)[14,22,23,24,25], and whole-genome sequences (WGS)[26,27,28,29,30,31]. With a larger N50 contig size (26.3 vs 0.097 Mb) and a drastic reduction in the number of gaps (393 vs 72,051), the latest bovine reference genome assembly (ARS-UCD1.2)[40] is a clear improvement compared to its predecessor UMD3.141 and CNV are expected to be possibly identified with more confidence. An estimate of the effect of the marker density on the false positive CNV identification rate, is lacking when relying on ARS-UCD1.2

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call