Abstract

BackgroundAs next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample.ResultsWe introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites.ConclusionGenotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Highlights

  • As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care

  • Validating sequencing error rate estimates using monozygotic twins We begin by validating our family-based error-estimation method using monozygotic twins

  • Differences in the distribution of sequencing depth across the genome, observed between Lymphoblastoid cell line (LCL) and whole blood -derived samples [23], may contribute to differences in variant calling performance. These results suggest that the LCL samples from the iHART dataset are faithful representations of the DNA of their donors, if low-complexity regions of the genome are excluded

Read more

Summary

Introduction

As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. In order to responsibly use the results of genetic testing in patient treatment, clinicians need good estimates of the likelihood of false positive and false negative test results [1] This is a major obstacle for moving generation sequencing methods into the clinic since variant calls are highly dependent upon the details of the sequencing assay itself, and on the software pipeline used to analyze the data [2]. While best-practices have been established [3], software pipelines are continuously evolving, with new versions released every few years This makes it difficult to estimate error rates for the exact combination of sequencing platform and software pipeline used to generate data for each patient. The consensus method has been used to quantify the performance of sequencing platforms [7], aligners [8, 9], and variant calling algorithms [10, 11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call