Abstract

Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one’s own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance.The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy’s built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy’s analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time.Availability and implementation: The Galaxy Data Manager framework is implemented in Python and has been integrated as part of the core Galaxy platform. Individual Data Manager tools can be defined locally or installed from a ToolShed, allowing the Galaxy community to define additional Data Manager tools as needed, with full versioning and dependency support.Contact: dan@bx.psu.edu. or anton@bx.psu.eduSupplementary information: Supplementary data is available at Bioinformatics online.

Highlights

  • Galaxy (Blankenberg et al, 2010; Giardine et al, 2005; Goecks et al, 2010) is a web-based platform for performing large-scale data analysis

  • The only difference between defining a standard Galaxy tool and a Data Manager tool is the inclusion of the type 1⁄4 ‘data_manager’ attribute to the 5tool4 element; this declaration has the affect of instructing Galaxy to provide a JavaScript Object Notation (JSON) encoded dictionary of parameter and server settings to be optionally used by the executable and to trigger the Data Manager framework to process the tool output into new data table entries

  • In addition to automating the administration of Galaxy’s built-in data cache, the Data Manager framework provides a pluggable approach for ensuring reproducibility and provenance tracking of reference data

Read more

Summary

INTRODUCTION

Galaxy (Blankenberg et al, 2010; Giardine et al, 2005; Goecks et al, 2010) is a web-based platform for performing large-scale data analysis It is a completely open-source project that supports accessible, reproducible and transparent computational research and is available through the use of free public servers, private local installations and by launching instances in the Cloud. A new menu option, ‘Manage local data’, has been added to the Galaxy administrator interface Accessing this option enables an administrator to run Data Manager tools, inspect the results of individual Data Manager executions and view the current state of Galaxy’s built-in data registries. The Data Manager framework negates the need for the manual curating of reference data, it is compatible with any previously existing policy or process in-use for a Galaxy installation

METHODS
Data Manager tools
Data Manager configurations
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call