BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

Sergey I Nikolenko,Anton I Korobeynikov,Max A Alekseyev

doi:10.1186/1471-2164-14-s1-s7

Sergey I Nikolenko, Anton I Korobeynikov + Show 1 more

Open Access

https://doi.org/10.1186/1471-2164-14-s1-s7

Copy DOI

Abstract

Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data usually come up short in single-cell sequencing projects, algorithms actually used for single-cell error correction have been so far very simplistic.We introduce several novel algorithms based on Hamming graphs and Bayesian subclustering in our new error correction tool BAYESHAMMER. While BAYESHAMMER was designed for single-cell sequencing, we demonstrate that it also improves on existing error correction tools for multi-cell sequencing data while working much faster on real-life datasets. We benchmark BAYESHAMMER on both k-mer counts and actual assembly results with the SPADES genome assembler.

Highlights

Single-cell sequencing [1,2] based on the Multiple Displacement Amplification (MDA) technology [1,3] allows one to sequence genomes of important uncultivated bacteria that until recently had been viewed as unamenable to genome sequencing
We introduce the BAYESHAMMER error correction tool that does not rely on uniform coverage
Paired-end libraries were generated by an Illumina Genome Analyzer IIx from MDAamplified single-cell DNA and from multicell genomic DNA prepared from cultured E. coli, respectively These datasets consist of 100 bp paired-end reads with insert size 220; both E. coli datasets have average coverage ≈ 600×, the coverage is highly non-uniform in the single-cell case

Summary

Introduction

Single-cell sequencing [1,2] based on the Multiple Displacement Amplification (MDA) technology [1,3] allows one to sequence genomes of important uncultivated bacteria that until recently had been viewed as unamenable to genome sequencing. Existing metagenomic approaches (aimed at genes rather than genomes) are clearly limited for studies of such bacteria despite the fact that they represent the majority of species in such important studies as the Human Microbiome Project [4,5] or discovery of new antibiotics-producing bacteria [6]. Single-cell sequencing datasets have extremely nonuniform coverage that may vary from ones to thousands along a single genome (Figure 1). For many existing error correction tools, most notably QUAKE [7], uniform coverage is a prerequisite: in the case of non-uniform coverage they either do not work or produce poor results. Error correction tools often employ a simple idea of discarding rare k-mers, which

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Jan 1, 2013
Citations: 411	License type: cc-by

R Discovery Prime

R Discovery Prime

BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

A comprehensive evaluation of long read error correction methods
Haowen Zhang ... Srinivas Aluru
BMC Genomics | VOL. 21
Haowen Zhang, et. al.Haowen Zhang ... Srinivas Aluru
01 Dec 2020
BMC Genomics | VOL. 21

Illumina error correction near highly repetitive DNA regions improves de novo genome assembly
Mahdi Heydari ... Jan Fostier
BMC bioinformatics | VOL. 20
Mahdi Heydari, et. al.Mahdi Heydari ... Jan Fostier
03 Jun 2019
BMC bioinformatics | VOL. 20

Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data
Yun Heo ... Gowthami Manikandan
-
Yun Heo, et. al.Yun Heo ... Gowthami Manikandan
18 Mar 2021
18 Mar 2021

Improved Error Correction of NGS Data
Andrei Stefan Alic
-
Andrei Stefan AlicAndrei Stefan Alic
15 Jul 2016
15 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BayesHammer: Bayesian clustering for error correction in single-cell sequencing.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics