SAMSA2: a standalone metatranscriptome analysis pipeline

Samuel T Westreich,David A Mills,Danielle G Lemay,Michelle L Treiber,Ian Korf

doi:10.1186/s12859-018-2189-z

Samuel T Westreich, David A Mills + Show 3 more

Open Access

https://doi.org/10.1186/s12859-018-2189-z

Copy DOI

Abstract

BackgroundComplex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.ResultsSAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.ConclusionsSAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

Highlights

Complex microbial communities are an area of growing interest in biology
Improved speed and accuracy for metatranscriptome analysis the version 1.0 of Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) provided a complete metatranscriptome analysis pipeline, it depended on a public web service (MG-RAST) for the annotation step, which was a major roadblock with respect to speed due the growing popularity of this resource
SAMSA2 is standalone in the sense that the tool dependencies are downloadable and can be run on the user’s own compute resources

Summary

Introduction

Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. High-throughput sequencing methods are used to identify both culturable and unculturable microbial species. 16S ribosomal profiling is still most commonly used, researchers are adopting more comprehensive sequencing methods such as metagenomics and metatranscriptomics. Metatranscriptomics—sequencing of all RNA from a diverse sample—captures all gene expression, giving a view of which microbes are active and what they are doing. Despite the power of metatranscriptomics, there are still relatively few bioinformatics tools designed to handle this complex type of data. Some recent metatranscriptome pipelines (anvi’o, [7], IMP, [8]) rely on BLAST [9] for annotations, which may be unacceptably slow when processing multiple millions of sequences per sample

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 21, 2018
Citations: 107	License type: open-access

R Discovery Prime

R Discovery Prime

SAMSA2: a standalone metatranscriptome analysis pipeline

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
Ségolène Caboche ... Christophe Audebert
BMC Genomics | VOL. 15
Ségolène Caboche, et. al.Ségolène Caboche ... Christophe Audebert
01 Jan 2014
BMC Genomics | VOL. 15

Answering biological questions by querying k‐mer databases
Paul Greenfield ... Uwe Roehm
Concurrency and Computation: Practice and Experience | VOL. 25
Paul Greenfield, et. al.Paul Greenfield ... Uwe Roehm
11 Oct 2012
Concurrency and Computation: Practice and Experience | VOL. 25

FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections.
Khalid K Alam ... Donald H Burke
Molecular Therapy - Nucleic Acids | VOL. 4
Khalid K Alam, et. al.Khalid K Alam ... Donald H Burke
01 Jan 2015
Molecular Therapy - Nucleic Acids | VOL. 4

Sequencing technologies — the next generation
Michael L Metzker
Nature Reviews Genetics | VOL. 11
Michael L MetzkerMichael L Metzker
08 Dec 2009
Nature Reviews Genetics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SAMSA2: a standalone metatranscriptome analysis pipeline

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics