Abstract

BackgroundGenotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at highly polymorphic or divergent regions of the genome. Graph-based methods facilitate the comparison of sequencing reads to a variation-aware genome graph, which incorporates a collection of non-redundant DNA sequences that segregate within a species. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely-used methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle.ResultsWe discovered 21,140,196, 20,262,913, and 20,668,459 polymorphic sites using GATK, Graphtyper, and SAMtools, respectively. Comparisons between sequence variant genotypes and microarray-derived genotypes showed that Graphtyper outperformed both GATK and SAMtools in terms of genotype concordance, non-reference sensitivity, and non-reference discrepancy. The sequence variant genotypes that were obtained using Graphtyper had the smallest number of Mendelian inconsistencies between sequence-derived single nucleotide polymorphisms and indels in nine sire-son pairs. Genotype phasing and imputation using the Beagle software improved the quality of the sequence variant genotypes for all the tools evaluated, particularly for animals that were sequenced at low coverage. Following imputation, the concordance between sequence- and microarray-derived genotypes was almost identical for the three methods evaluated, i.e., 99.32, 99.46, and 99.24% for GATK, Graphtyper, and SAMtools, respectively. Variant filtration based on commonly used criteria improved genotype concordance slightly but it also decreased sensitivity. Graphtyper required considerably more computing resources than SAMtools but less than GATK.ConclusionsSequence variant genotyping using Graphtyper is accurate, sensitive and computationally feasible in cattle. Graph-based methods enable sequence variant genotyping from variation-aware reference genomes that may incorporate cohort-specific sequence variants, which is not possible with the current implementation of state-of-the-art methods that rely on linear reference genomes.

Highlights

  • Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome

  • The sequencing read data of 49 cattle were deposited at European Nucleotide Archive (ENA) under primary accession PRJEB28191

  • Because variant filtering has a strong impact on the accuracy and sensitivity of sequence variant genotyping [53, 54], we evaluated both the raw variants that were detected using default parameters for variant discovery (Fig. 1) and variants that remained after applying filtering criteria that are commonly used but may differ slightly between different software tools

Read more

Summary

Introduction

Genotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widelyused methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle. The multi-sample sequence variant genotyping approach that is implemented in the SAMtools software has to be restarted for the entire cohort, once new samples are added. GATK implements two different approaches to multisample variant discovery, i.e., the UnifiedGenotyper and HaplotypeCaller modules, with the latter relying on intermediate files in gVCF format that include probabilistic data on variant and non-variant sites for each sequenced sample. Once new samples are added to an existing cohort, only the latter needs to be performed for the entire cohort, enabling computationally efficient parallelization of sequence variant genotyping in a large number of samples

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call