Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

Tilo Buschmann,Leonid V Bystrykh,Rong Zhang,Douglas E Brash

doi:10.1186/1471-2105-15-264

Abstract

BackgroundDNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements.ResultsIn our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples.ConclusionOur method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2105-15-264) contains supplementary material, which is available to authorized users.

Highlights

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments
Detection of barcoded reads is only the first step in the demultiplexing pipeline, so we further investigated the quality of correcting errors in the detected barcode sequence and assigning reads to their original samples
If no information about inserts are available and only known barcode sequences are used for barcode detection, the results suggest to use at least 10-nt-long barcodes for 20 samples, 11-nt-long barcodes for 50 samples, 12-nt-long barcodes for 150 samples and even longer barcodes for larger sample sizes

Summary

Introduction

DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. Multiplexed deep sequencing is a cost-saving and timesaving technology used with Generation Sequencing that combines and sequences multiple DNA samples as one This method relies on labeling genomic sequences from separate samples with specific tags, known as even with the best possible barcode design, recognition of short barcode sequences in the DNA context is often problematic. The main strategy for recovering short barcodes relies on the sequence identity, and on the expected position of the barcode, which is usually found at the beginning of the sequence either behind a sequencing primer or in front of a PCR primer. The beginning of the barcode may be shifted by one or more positions which appears as an insertion or deletion error in the barcode

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Aug 7, 2014
Citations: 44	License type: cc-by

R Discovery Prime

R Discovery Prime

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Abstract A78: Common single nucleotide polymorphisms in immunoregulatory genes and multiple myeloma risk among women in Connecticut
Kyoung-Mu Lee ... Stephen Chanock
Cancer Prevention Research | VOL. 1
Kyoung-Mu Lee, et. al.Kyoung-Mu Lee ... Stephen Chanock
01 Nov 2008
Cancer Prevention Research | VOL. 1

Symmetric directional false discovery rate control
Sarah E Holte ... Yajun Mei
Statistical Methodology | VOL. 33
Sarah E Holte, et. al.Sarah E Holte ... Yajun Mei
24 Aug 2016
Statistical Methodology | VOL. 33

A False Discovery Rate approach to optimal volatility forecasting model selection
Arman Hassanniakalager ... Emmanouil Platanakis
International Journal of Forecasting | VOL. 40
Arman Hassanniakalager, et. al.Arman Hassanniakalager ... Emmanouil Platanakis
01 Aug 2023
International Journal of Forecasting | VOL. 40

A Study Over Brain Connectivity Network of Parkinson's Patients, Using Nonparametric Bayesian Model.
Fatemeh Pourmotahari ... Hamid Alavimajd
Basic and clinical neuroscience | VOL. 15
Fatemeh Pourmotahari, et. al.Fatemeh Pourmotahari ... Hamid Alavimajd
01 Jan 2024
Basic and clinical neuroscience | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics