Atropos: specific, sensitive, and speedy trimming of sequencing reads.

John P Didion,Francis S Collins,Marcel Martin

doi:10.7717/peerj.3720

John P Didion, Francis S Collins + Show 1 more

Open Access

https://doi.org/10.7717/peerj.3720

Copy DOI

Abstract

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

Highlights

All current-generation sequencing technologies, including Illumina, ABI SOLiD, and Ion Torrent, require a library construction step that involves the introduction of short adapter sequences at the ends of the template DNA fragments
We focused on making three specific improvements to Cutadapt: (1) improve the accuracy of paired-end read trimming by implementing an insert-match algorithm; (2) improve the performance by adding multiprocessing support; and (3) add important additional features such as automated trimming of Methyl-Seq reads, automated detection of adapter sequences in reads where the experimental protocols are not known to the analyst, estimation of sequencing error, and generation of quality control (QC) metrics
Performance On a desktop computer with four processing cores, we found that AdapterRemoval had the fastest overall execution time, followed closely by SeqPurge, Atropos, and Skewer (Fig. 2A; Table S2)

Summary

Introduction

All current-generation sequencing technologies, including Illumina, ABI SOLiD, and Ion Torrent, require a library construction step that involves the introduction of short adapter sequences at the ends of the template DNA fragments. Depending on the sequencing platform and the fragment size distribution of the sequencing library, an often substantial fraction of reads will consist of both template and adapter sequences (Fig. 1A). The error rates of these sequencing technologies vary from 0.1% on Illumina to 5% or more on long-read sequencing platforms. Error rates tend to be enriched at the ends of reads (where adapters are located), exacerbating the effects of adapter contamination. Adapter contamination and sequencing errors can lead to increased rates of misaligned and unaligned reads, which results in errors in downstream analysis including spurious variant calls (Del Fabbro et al, 2013; Sturm, Schroeder & Bauer, 2016). Some methylation sequencing (Methyl-Seq) protocols result in artificially

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Aug 30, 2017
Citations: 200	License type: CC0 1.0

R Discovery Prime

R Discovery Prime

Atropos: specific, sensitive, and speedy trimming of sequencing reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Read trimming has minimal effect on bacterial SNP-calling accuracy.
Stephen J Bush
Microbial Genomics | VOL. 6
Stephen J BushStephen J Bush
01 Dec 2020
Microbial Genomics | VOL. 6

To Trim or Not to Trim: Effects of Read Trimming on the De Novo Genome Assembly of a Widespread East Asian Passerine, the Rufous-Capped Babbler (Cyanoderma ruficeps Blyth).
Shang-Fang Yang ... Chih-Ming Hung
Genes | VOL. 10
Shang-Fang Yang, et. al.Shang-Fang Yang ... Chih-Ming Hung
23 Sep 2019
Genes | VOL. 10

CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction
Carlos Guzman ... Iván D'Orso
BMC bioinformatics | VOL. 18
Carlos Guzman, et. al.Carlos Guzman ... Iván D'Orso
08 Aug 2017
BMC bioinformatics | VOL. 18

An extensive evaluation of read trimming effects on Illumina NGS data analysis.
Cristian Del Fabbro ... Jeong-Sun Seo
PloS one | VOL. 8
Cristian Del Fabbro, et. al.Cristian Del Fabbro ... Jeong-Sun Seo
23 Dec 2013
PloS one | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Atropos: specific, sensitive, and speedy trimming of sequencing reads.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ