Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

Joost B Beltman,Ton N Schumacher,Shalin H Naik,Jos Urbanus,Jan C Rohr,Nienke Van Rooij,Arno Velds

doi:10.1186/s12859-016-0999-4

Abstract

BackgroundNext generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags.ResultsHere, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences.ConclusionsApplication of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0999-4) contains supplementary material, which is available to authorized users.

Highlights

Generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing
Overview of experimental barcoding technology In cellular barcoding (Fig. 1), progenitor cells of interest are isolated from appropriate tissue and exposed to a library of retro- or lenti-viral vectors that each carry one DNA barcode from a large pool of barcodes
We present a novel approach to clean up barcoding data that does not require independent sequencing of a reference library, and that is based on our observation that individual sequencing error occurs at a predictable rate across samples in Illumina HiSeq data

Summary

Introduction

Generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Quantification of the amount of offspring of a barcoded cell is achieved by PCR amplification, followed by generation sequencing. Indexing of samples allows one to run many samples of different cell types, organs and time points within a single deep sequencing run, thereby allowing highthroughput acquisition of data [17]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Apr 2, 2016
Citations: 56	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

RTCR: a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data.
Bram Gerritsen ... Arno C Andeweg
Bioinformatics | VOL. 32
Bram Gerritsen, et. al.Bram Gerritsen ... Arno C Andeweg
20 Jun 2016
Bioinformatics | VOL. 32

Detection of BCR-ABL1 Compound and Polyclonal Mutants in Chronic Myeloid Leukemia Patients Using a Novel Next Generation Sequencing Approach That Minimises PCR and Sequencing Errors
Wendy T Parker ... Susan Branford
Blood | VOL. 124
Wendy T Parker, et. al.Wendy T Parker ... Susan Branford
06 Dec 2014
Blood | VOL. 124

Abstract A57: Uncovering instrument errors in next-generation sequencing by CleanDeepSeq2
Eric Davis ... John Easton
Clinical Cancer Research | VOL. 26
Eric Davis, et. al.Eric Davis ... John Easton
01 Jun 2020
Clinical Cancer Research | VOL. 26

Scraping the bottom of the barrel: are rare high throughput sequences artifacts?
Shawn P Brown ... Allison M Veach
Fungal Ecology | VOL. 13
Shawn P Brown, et. al.Shawn P Brown ... Allison M Veach
05 Oct 2014
Fungal Ecology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics