Abstract
BackgroundComplex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.ResultsSAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.ConclusionsSAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Highlights
Complex microbial communities are an area of growing interest in biology
Improved speed and accuracy for metatranscriptome analysis the version 1.0 of Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) provided a complete metatranscriptome analysis pipeline, it depended on a public web service (MG-RAST) for the annotation step, which was a major roadblock with respect to speed due the growing popularity of this resource
SAMSA2 is standalone in the sense that the tool dependencies are downloadable and can be run on the user’s own compute resources
Summary
Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. High-throughput sequencing methods are used to identify both culturable and unculturable microbial species. 16S ribosomal profiling is still most commonly used, researchers are adopting more comprehensive sequencing methods such as metagenomics and metatranscriptomics. Metatranscriptomics—sequencing of all RNA from a diverse sample—captures all gene expression, giving a view of which microbes are active and what they are doing. Despite the power of metatranscriptomics, there are still relatively few bioinformatics tools designed to handle this complex type of data. Some recent metatranscriptome pipelines (anvi’o, [7], IMP, [8]) rely on BLAST [9] for annotations, which may be unacceptably slow when processing multiple millions of sequences per sample
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have