Abstract

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.

Highlights

  • Over 110,000 metagenomic data sets have been uploaded and analyzed in MG-RAST [1] since 2007, totaling over 43 Terabases (TBp)

  • As of summer 2013, the MG-RAST metagenomics analysis system has been used to annotate over 110,000 data sets totaling over 43 Terabases

  • By opening MG-RAST up via a web services application programmers interface (API) we have enabled a programmatic way for others to use their bioinformatics tools with MG-RAST data

Read more

Summary

Introduction

Over 110,000 metagenomic data sets have been uploaded and analyzed in MG-RAST [1] since 2007, totaling over 43 Terabases (TBp). Uniform processing and robust sequence quality control enable comparison across experimental systems and, to some extent, across sequencing platforms. With the inclusion of standardized metadata [2] MG-RAST has enabled meta-analysis available through its web-based user interface at http://metagenomics.anl.gov. As with most GUIs, there are limitations to what can be done Examples of this include the number of samples processed in a single analysis, access to complete metadata, and easy access to raw data and quality metrics for each sample. As part of the DOE Systems Biology Knowledgebase project (KBase) we have implemented a web services application programmers interface (API) that exposes all data to (authenticated) programmers, enabling users to access available data and functionality through software applications. User access to MGRAST’s internal data structures is possible

Design and Implementation
Author Summary
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call