Barcode identification for single cell genomics

Akshay Tambe,Lior Pachter

doi:10.1186/s12859-019-2612-0

Abstract

BackgroundSingle-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes.ResultsHere we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers.ConclusionWe show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.

Highlights

Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell
In the Drop-Seq protocol, which is a popular microfluidic-based single-cell experimental platform, DNA barcodes are synthesized on a solid bead support, using split-and-pool DNA synthesis [10], and this approach has been applied to obtain single-cell transcriptome profiles from a number of model- and non-model organisms [3, 6, 13, 16, 19, 21]
We have shown how a de Bruijn graph formulation of the barcode calling problem based on circularization of input sequences is a useful approach to identify and error-correct barcode sequences

Summary

Introduction

Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. This step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Tagging of sequencing reads with short DNA barcodes is a common experimental practice that enables a pooled sequencing library to be separated into biologically meaningful partitions This technique is in the cornerstone of many single-cell sequencing experiments, where reads originating from individual cells are tagged with cell-specific barcodes; as such, the first step in any single-cell sequencing experiment involves separating reads by barcode to recover single-cell profiles Some current approaches require that the approximate number of cells in the experiment be known beforehand, and in some experimental contexts such information is not obtained

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 17, 2019
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Barcode identification for single cell genomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

On the development of voluntary and reflexive components in human saccade generation
Burkhart Fischer ... Stefan Gezeck
Brain Research | VOL. 754
Burkhart Fischer, et. al.Burkhart Fischer ... Stefan Gezeck
01 Apr 1997
Brain Research | VOL. 754

ELECTRONIC PRESCRIBING: FRIEND OR FOE? AN AUDIT OF PRESCRIBING ERRORS AFTER THE INTRODUCTION OF AN ELECTRONIC PRESCRIBING SYSTEM IN NEONATES AND CHILDREN
M O'Meara ... N Shaheen
Archives of Disease in Childhood | VOL. 99
M O'Meara, et. al.M O'Meara ... N Shaheen
11 Jul 2014
Archives of Disease in Childhood | VOL. 99

Environmental niche models improve species identification in DNA barcoding
Cai‐Qing Yang ... Ai‐Bing Zhang
Methods in Ecology and Evolution | VOL. -
Cai‐Qing Yang, et. al.Cai‐Qing Yang ... Ai‐Bing Zhang
27 Oct 2024
Methods in Ecology and Evolution | VOL. -

Coil: an R package for cytochrome c oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation.
Cameron M Nugent ... Sujeevan Ratnasingham
Genome | VOL. 63
Cameron M Nugent, et. al.Cameron M Nugent ... Sujeevan Ratnasingham
14 May 2020
Genome | VOL. 63

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Barcode identification for single cell genomics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics