Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

Shifu Chen,Jia Gu,Zhicheng Li,Yanqing Zhou,Wenting Liao,Yaru Chen,Yun Xu,Tanxiao Huang

doi:10.1186/s12859-019-3280-9

Abstract

BackgroundRemoving duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove these errors while performing deduplication operations. Recently, a new technology called unique molecular identifier (UMI) has been developed to better identify sequencing reads derived from different DNA fragments. Most existing duplicate removing tools cannot handle the UMI-integrated data. Some modern tools can work with UMIs, but are usually slow and use too much memory. Furthermore, existing tools rarely report rich statistical results, which are very important for quality control and downstream analysis. These unmet requirements drove us to develop an ultra-fast, simple, little-weighted but powerful tool for duplicate removing and sequence error suppressing, with features of handling UMIs and reporting informative results.ResultsThis paper presents an efficient tool gencore for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. When unique molecular identifier (UMI) technology is applied, gencore can use them to identify the reads derived from same original DNA fragment. Gencore reports statistical results in both HTML and JSON formats. The HTML format report contains many interactive figures plotting statistical coverage and duplication information. The JSON format report contains all the statistical results, and is interpretable for downstream programs.ConclusionsComparing to the conventional tools like Picard and SAMtools, gencore greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. Comparing to some new tools like UMI-Reducer and UMI-tools, gencore runs much faster, uses less memory, generates better consensus reads and provides simpler interfaces. To our best knowledge, gencore is the only duplicate removing tool that generates both informative HTML and JSON reports. This tool is available at: https://github.com/OpenGene/gencore

Highlights

High-depth next-generation sequencing (NGS) has been widely used for precision cancer diagnosis and treatment [1]
Since the tumor-derived DNA is usually a small part of the total blood cell-free DNA, the mutant allele frequency (MAF) of a variant detected from circulating tumor DNA (ctDNA) sequencing data can be very low
To better identify sequencing reads derived from different DNA fragments, a technology called unique molecular identifier (UMI) has been developed

Summary

Results

This paper presents an efficient tool gencore for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. Gencore reports statistical results in both HTML and JSON formats. The JSON format report contains all the statistical results, and is interpretable for downstream programs

Conclusions

Introduction

Results and discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2019
Citations: 54	License type: open-access

R Discovery Prime

R Discovery Prime

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

The Use of Unique Molecular Identifiers (UMIs) Strongly Improves Sequencing Detection Limits Allowing Earlier Detection of Small TP53 Mutated Clones in Leukemias
Constance Regina Baer ... Torsten Haferlach
Blood | VOL. 128
Constance Regina Baer, et. al.Constance Regina Baer ... Torsten Haferlach
02 Dec 2016
Blood | VOL. 128

Abstract 418: Highly efficient duplex DNA tagging strategy improves accuracy of detecting ultra-low-frequency mutations through consensus read reconstruction
Jiashi Wang ... David Kupec
Cancer Research | VOL. 78
Jiashi Wang, et. al.Jiashi Wang ... David Kupec
01 Jul 2018
Cancer Research | VOL. 78

Author response: Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants
Elizabeth Jaworski ...
-
Elizabeth Jaworski, et. al.Elizabeth Jaworski ...
03 Sep 2021
03 Sep 2021

Abstract 7425: UMI-nea: A fast and robust UMI analysis approach to accurately identify and quantify TCR repertoire from targeted RNA sequencing with wide range of input molecules
Jixin Deng ... Jingxiao Zhang
Cancer Research | VOL. 84
Jixin Deng, et. al.Jixin Deng ... Jingxiao Zhang
22 Mar 2024
Cancer Research | VOL. 84

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics