ReliableGenome: annotation of genomic regions with high/low variant calling concordance.

Niko Popitsch,Anna Schuh,Jenny C Taylor

doi:10.1093/bioinformatics/btw587

Abstract

MotivationThe increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity.ResultsHere, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g. consensus calling methods) on the smaller, discordant share of the genome (20–30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines.Availability and ImplementationRG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Whole-genome resequencing (WGS) allows researchers to address a broad range of clinical and research questions at comparably low costs and with short turnaround times
The observed discordance between state-of-the-art variant calling (VC) pipelines, indicates that the current practice still suffers from nonnegligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls and in genomic regions with low sequence complexity
By applying RG to 219 deep human whole-genome resequencing (WGS) datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method

Summary

Introduction

One proposed practice to improve overall VC accuracy is to apply multiple VC pipelines to the same sequencing data and combine the results in order to reach a consensus from multiple algorithms (Cantarel et al, 2014; Gezsi et al, 2015) While this strategy may significantly increase VC accuracy it greatly increases analysis costs and turnaround times which may be unfeasible in many real world situations. Such a consensus approach was used for the development of first genome-wide benchmarks that enable us to determine VC accuracy and reproducibility and pave the way for systematically improving these measures (Goldfeder et al, 2016; Highnam et al, 2015; Zook et al, 2014). Removing all variant calls in such difficult regions is straightforward and did not compromise sensitivity significantly in the author’s evaluation (Li, 2014)

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Sep 7, 2016
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ReliableGenome: annotation of genomic regions with high/low variant calling concordance.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes.
Zhen Xuan Yeo ... Steven G Rozen
BMC Genomics | VOL. 15
Zhen Xuan Yeo, et. al.Zhen Xuan Yeo ... Steven G Rozen
24 Jun 2014
BMC Genomics | VOL. 15

Negligible effects of read trimming on the accuracy of germline short variant calling in the human genome
Yury Barbitoff ... Alexander Predeus
F1000Research | VOL. 13
Yury Barbitoff, et. al.Yury Barbitoff ... Alexander Predeus
17 May 2024
F1000Research | VOL. 13

GLM-based optimization of NGS data analysis: A case study of Roche 454, Ion Torrent PGM and Illumina NextSeq sequencing data.
Sarah Sandmann ... Martin Dugas
PLOS ONE | VOL. 12
Sarah Sandmann, et. al.Sarah Sandmann ... Martin Dugas
21 Feb 2017
PLOS ONE | VOL. 12

Abstract A63: Examination of ctDNA false positive variants reported from commercial vendors by ultrasensitive orthogonal testing
Daniel Stetson ... Ambar Ahmed
Clinical Cancer Research | VOL. 26
Daniel Stetson, et. al.Daniel Stetson ... Ambar Ahmed
01 Jun 2020
Clinical Cancer Research | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ReliableGenome: annotation of genomic regions with high/low variant calling concordance.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics