CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

Sonia Agrawal,W Florian Fricke,Kevin Galens,Cesar Arze,Owen White,Samuel V Angiuoli,Hervé Tettelin,Jonathan Crabtree,David Riley,Anup Mahurkar,Ricky S Adkins,Claire M Fraser,Mahesh Vangala

doi:10.1186/s12864-017-3717-3

Sonia Agrawal, W Florian Fricke + Show 11 more

Open Access

https://doi.org/10.1186/s12864-017-3717-3

Copy DOI

Abstract

BackgroundThe benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics.ResultsCloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2.ConclusionsCloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

Highlights

The benefit of increasing genomic sequence data to the scientific community depends on easy-touse, scalable bioinformatics support
We describe Cloud Virtual Resource (CloVR)-Comparative, an open-source, automated, easy-to-use bioinformatics pipeline for comparative genome sequence analysis
The CloVR-Comparative output was organized into the following four groups: Summary reports To provide the user with a quick overview and fast way to review the success of the analysis, CloVR-Comparative generates a summary report for each pipeline run, which includes references to the original publications on individual analysis components as background information for interested users

Summary

Results

CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in

Conclusions

Background

Circleator

Summary report files

Results and discussion