Analysis of error profiles in deep next-generation sequencing data

Xiaotu Ma,Xiang Chen,Shawn Levy,Joy Nakitandwe,Yu Liu,Shuhong Shen,Sheila Shurtleff,Ying Shao,Diane A Flasch,Jinghui Zhang,John Easton,Liqing Tian,Heather L Mulder,Michael N Edmonson,Yongjin Li,Scott Newman,Leslie L Robison,Zhaoming Wang,Benshang Li

doi:10.1186/s13059-019-1659-6

Abstract

BackgroundSequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions.ResultsBy evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10−5 to 10−4, which is 10- to 100-fold lower than generally considered achievable (10−3) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10−5 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10−4 for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression.ConclusionsWe present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.

Highlights

Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS)
We systematically investigated substitution error profiles by analyzing multiple sequencing datasets from five DNA sequencing providers: three a b c deep sequencing datasets generated by St
Jude), HudsonAlpha Institute of Biotechnology (HAIB), and WuXiNextCode and whole-exome sequencing datasets generated by Broad Institute (BI) and Baylor College of Medicine (BCM) on five different Illumina sequencing platforms (Additional file 1: Table S1)

Summary

Introduction

Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). Errors acquired during next-generation sequencing (NGS) are key confounding factors of sensitive detection of low-frequency variants by deep sequencing. The substitution error rate by conventional NGS was first reported to be > 0.1% in 2011 [10] and was similar in later reports [11, 12] and in a recent review [1]. This presumed high error rate (> 0.1%) constrains further exploration of ways to improve sensitivity of low-frequency variant detection. With the rapid progress in sequencing technology and dramatic reductions in sequencing cost, there is a great need to systematically evaluate

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Mar 14, 2019
Citations: 218	License type: open-access

R Discovery Prime

R Discovery Prime

Analysis of error profiles in deep next-generation sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Abstract 3538: Analysis of error profiles in deep next-generation sequencing data
Xiaotu Ma ... Jinghui Zhang
Cancer Research | VOL. 79
Xiaotu Ma, et. al.Xiaotu Ma ... Jinghui Zhang
01 Jul 2019
Abstract 3538: Analysis of error profiles in deep next-generation sequencing data
Xiaotu Ma ... Jinghui Zhang

Abstract LB077: Analysis of indel and structural variant error profiles in deep next generation sequencing data
Ying Shao ... Lingyun Ji
Cancer Research | VOL. 83
Ying Shao, et. al.Ying Shao ... Lingyun Ji
14 Apr 2023
Cancer Research | VOL. 83

Characterization of DNA lesions associated with cell-free DNA by targeted deep sequencing
Seung-Ho Shin ... Donghyun Park
BMC medical genomics | VOL. 14
Seung-Ho Shin, et. al.Seung-Ho Shin ... Donghyun Park
28 Jul 2021
BMC medical genomics | VOL. 14

Characterization of microRNA transcriptome in lung cancer by next-generation deep sequencing
Jie Ma ... Feng Jiang
Molecular Oncology | VOL. 8
Jie Ma, et. al.Jie Ma ... Feng Jiang
15 Apr 2014
Molecular Oncology | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of error profiles in deep next-generation sequencing data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology