Abstract

BackgroundComplex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.ResultsSAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.ConclusionsSAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

Highlights

  • Complex microbial communities are an area of growing interest in biology

  • Improved speed and accuracy for metatranscriptome analysis the version 1.0 of Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) provided a complete metatranscriptome analysis pipeline, it depended on a public web service (MG-RAST) for the annotation step, which was a major roadblock with respect to speed due the growing popularity of this resource

  • SAMSA2 is standalone in the sense that the tool dependencies are downloadable and can be run on the user’s own compute resources

Read more

Summary

Introduction

Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. High-throughput sequencing methods are used to identify both culturable and unculturable microbial species. 16S ribosomal profiling is still most commonly used, researchers are adopting more comprehensive sequencing methods such as metagenomics and metatranscriptomics. Metatranscriptomics—sequencing of all RNA from a diverse sample—captures all gene expression, giving a view of which microbes are active and what they are doing. Despite the power of metatranscriptomics, there are still relatively few bioinformatics tools designed to handle this complex type of data. Some recent metatranscriptome pipelines (anvi’o, [7], IMP, [8]) rely on BLAST [9] for annotations, which may be unacceptably slow when processing multiple millions of sequences per sample

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call