On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.

Nicolas C Rochette,Jessica Walsh,Julian M Catchen,Angel G Rivera‐Colón,Thomas J Sanger,Shane C Campbell‐Staton

doi:10.1111/1755-0998.13800

Nicolas C Rochette, Jessica Walsh + Show 4 more

Open Access

https://doi.org/10.1111/1755-0998.13800

Copy DOI

Abstract

Library preparation protocols for most sequencing technologies involve PCR amplification of the template DNA, which open the possibility that a given template DNA molecule is sequenced multiple times. Reads arising from this phenomenon, known as PCR duplicates, inflate the cost of sequencing and can jeopardize the reliability of affected experiments. Despite the pervasiveness of this artefact, our understanding of its causes and of its impact on downstream statistical analyses remains essentially empirical. Here, we develop a general quantitative model of amplification distortions in sequencing data sets, which we leverage to investigate the factors controlling the occurrence of PCR duplicates. We show that the PCR duplicate rate is determined primarily by the ratio between library complexity and sequencing depth, and that amplification noise (including in its dependence on the number of PCR cycles) only plays a secondary role for this artefact. We confirm our predictions using new and published RAD-seq libraries and provide a method to estimate library complexity and amplification noise in any data set containing PCR duplicates. We discuss how amplification-related artefacts impact downstream analyses, and in particular genotyping accuracy. The proposed framework unites the numerous observations made on PCR duplicates and will be useful to experimenters of all sequencing technologies where DNA availability is a concern.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Molecular Ecology Resources	Publication Date: Apr 16, 2023
Citations: 10	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.

Abstract

Talk to us

Similar Papers

More From: Molecular Ecology Resources

Lead the way for us

Similar Papers

Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing
Wei Sun ... Dietmar Schmucker
The EMBO Journal | VOL. 32
Wei Sun, et. al.Wei Sun ... Dietmar Schmucker
21 Jun 2013
The EMBO Journal | VOL. 32

A One-Step, Real-Time PCR Assay for Rapid Detection of Rhinovirus
Duc H Do ... Robert M Wadowsky
The Journal of Molecular Diagnostics | VOL. 12
Duc H Do, et. al.Duc H Do ... Robert M Wadowsky
01 Jan 2009
The Journal of Molecular Diagnostics | VOL. 12

Replacing the SpCas9 HNH domain by deaminases generates compact base editors with an alternative targeting scope
Lukas Villiger ... Gerald Schwank
Molecular Therapy - Nucleic Acids | VOL. 26
Lukas Villiger, et. al.Lukas Villiger ... Gerald Schwank
26 Aug 2021
Molecular Therapy - Nucleic Acids | VOL. 26

Isolation of E. coli RNA polymerase transcription elongation complexes by selective solid-phase photoreversible immobilization.
Eric J Strobel
Methods in enzymology | VOL. 691
Eric J StrobelEric J Strobel
01 Jan 2023
Methods in enzymology | VOL. 691

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the causes, consequences, and avoidance of PCR duplicates: Towards a theory of library complexity.

Abstract

Talk to us

Similar Papers

More From: Molecular Ecology Resources