SECAPR-a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments.

Tobias Andermann,Ángela Cano,Alexandre Antonelli,Christine Bacon,Alexander Zizka

doi:10.7717/peerj.5175

Tobias Andermann, Ángela Cano + Show 3 more

Open Access

https://doi.org/10.7717/peerj.5175

Copy DOI

Abstract

Evolutionary biology has entered an era of unprecedented amounts of DNA sequence data, as new sequencing technologies such as Massive Parallel Sequencing (MPS) can generate billions of nucleotides within less than a day. The current bottleneck is how to efficiently handle, process, and analyze such large amounts of data in an automated and reproducible way. To tackle these challenges we introduce the Sequence Capture Processor (SECAPR) pipeline for processing raw sequencing data into multiple sequence alignments for downstream phylogenetic and phylogeographic analyses. SECAPR is user-friendly and we provide an exhaustive empirical data tutorial intended for users with no prior experience with analyzing MPS output. SECAPR is particularly useful for the processing of sequence capture (synonyms: target or hybrid enrichment) datasets for non-model organisms, as we demonstrate using an empirical sequence capture dataset of the palm genus Geonoma (Arecaceae). Various quality control and plotting functions help the user to decide on the most suitable settings for even challenging datasets. SECAPR is an easy-to-use, free, and versatile pipeline, aimed to enable efficient and reproducible processing of MPS data for many samples in parallel.

Highlights

An increasing number of studies apply sequence data generated by Massive Parallel Sequencing (MPS) to answer phylogeographic and phylogenetic questions (e.g., BoteroCastro et al, 2013; Smith et al, 2014a; Smith et al, 2014b; Faircloth et al, 2015; Heyduk et al, 2016)
Phylogenetic analysis software usually relies on multiple sequence alignments (MSAs) with homologous sequences across many taxa, which are simple to recover when enriching these sequences prior to sequencing
Here we introduce the Sequence Capture Processor (SECAPR) pipeline, a semi-automated workflow to guide users from raw sequencing results to cleaned and filtered multiple sequence alignments (MSAs) for phylogenetic and phylogeographic analyses

Summary

Introduction

An increasing number of studies apply sequence data generated by Massive Parallel Sequencing (MPS) to answer phylogeographic and phylogenetic questions (e.g., BoteroCastro et al, 2013; Smith et al, 2014a; Smith et al, 2014b; Faircloth et al, 2015; Heyduk et al, 2016). Researchers often decide to selectively enrich and sequence specific genomic regions of interest, rather than sequencing the complete genome. One reason is that enriching specific markers leads to a higher sequencing depth for each individual marker, as compared to the alternative of sequencing full genomes. How to cite this article Andermann et al (2018), SECAPR—a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments. Phylogenetic analysis software usually relies on multiple sequence alignments (MSAs) with homologous sequences across many taxa, which are simple to recover when enriching these sequences prior to sequencing

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Jul 13, 2018
Citations: 53	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

SECAPR-a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Clinical application of massively parallel sequencing in the molecular diagnosis of glycogen storage diseases of genetically heterogeneous origin
Jing Wang ... Lee-Jun Wong
Genetics in Medicine | VOL. 15
Jing Wang, et. al.Jing Wang ... Lee-Jun Wong
16 Aug 2012
Genetics in Medicine | VOL. 15

Genotyping and Sequencing Technologies in Population Genetics and Genomics
J A Holliday ... D C Haak
-
J A Holliday, et. al.J A Holliday ... D C Haak
01 Jan 2018
01 Jan 2018

A high-throughput method for quantification of glycoprotein sialylation
Lam Raga A Markely ... Daniel I.C Wang
Analytical Biochemistry | VOL. 407
Lam Raga A Markely, et. al.Lam Raga A Markely ... Daniel I.C Wang
06 Aug 2010
Analytical Biochemistry | VOL. 407

Nested Patch PCR for highly multiplexed amplification of genomic loci.
Katherine E Varley ... Robi D Mitra
Cold Spring Harbor protocols | VOL. 2009
Katherine E Varley, et. al.Katherine E Varley ... Robi D Mitra
01 Jul 2009
Cold Spring Harbor protocols | VOL. 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SECAPR-a bioinformatics pipeline for the rapid and user-friendly processing of targeted enriched Illumina sequences, from raw reads to alignments.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ