A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments

Vikas Bansal

doi:10.1186/s12859-017-1471-9

Abstract

BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from “natural” read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45–50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70–95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates.

Highlights

PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing
Each cluster of read duplicates is a combination of natural read duplicates and PCR duplicates
PCR amplification is a necessary step in the preparation of DNA sequencing libraries for most high-throughput sequencing instruments

Summary

Introduction

PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. PCR amplification is an important step in virtually all library preparation protocols for highthroughput sequencing technologies [1, 2]. In the standard Illumina library preparation protocol, after universal adapters are ligated to the pool of DNA fragments, PCR amplification is done in order to enrich for fragments that have adapters ligated on both ends and can be sequenced successfully [3, 4]. For many reasons, it is of great interest to estimate the PCR duplication rate of high-throughput sequence datasets

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 1, 2017
Citations: 26	License type: open-access

R Discovery Prime

R Discovery Prime

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

The RNA-binding proteins hnRNP H and F regulate splicing of a MYC-dependent HRAS exon in prostate cancer cells
Xinyuan Chen ... Douglas L Black
Proceedings of the National Academy of Sciences of the United States of America | VOL. 120
Xinyuan Chen, et. al.Xinyuan Chen ... Douglas L Black
03 Jul 2023
Proceedings of the National Academy of Sciences of the United States of America | VOL. 120

On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments
Alemu Takele Assefa ... Olivier Thas
BMC Genomics | VOL. 21
Alemu Takele Assefa, et. al.Alemu Takele Assefa ... Olivier Thas
19 Apr 2020
BMC Genomics | VOL. 21

Decision letter: Statistical inference reveals the role of length, GC content, and local sequence in V(D)J nucleotide trimming
Thierry Mora ... Betty Diamond
-
Thierry Mora, et. al.Thierry Mora ... Betty Diamond
31 Jan 2023
31 Jan 2023

Abstract P2-03-18: Discovery of novel amplified genes in primary breast cancer with copy number and gene expression analysis of whole exome and transcriptome sequencing data
Eunshin Lee ... Han-Byoel Lee
Cancer Research | VOL. 75
Eunshin Lee, et. al.Eunshin Lee ... Han-Byoel Lee
30 Apr 2015
Cancer Research | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics