DockerBIO: web application for efficient use of bioinformatics Docker images.

Changhyuk Kwon,Jason Kim,Jaegyoon Ahn

doi:10.7717/peerj.5954

Abstract

Background and ObjectiveDocker is a light containerization program that shows almost the same performance as a local environment. Recently, many bioinformatics tools have been distributed as Docker images that include complex settings such as libraries, configurations, and data if needed, as well as the actual tools. Users can simply download and run them without making the effort to compile and configure them, and can obtain reproducible results. In spite of these advantages, several problems remain. First, there is a lack of clear standards for distribution of Docker images, and the Docker Hub often provides multiple images with the same objective but different uses. For these reasons, it can be difficult for users to learn how to select and use them. Second, Docker images are often not suitable as a component of a pipeline, because many of them include big data. Moreover, a group of users can have difficulties when sharing a pipeline composed of Docker images. Users of a group may modify scripts or use different versions of the data, which causes inconsistent results.Methods and ResultsTo handle the problems described above, we developed a Java web application, DockerBIO, which provides reliable, verified, light-weight Docker images for various bioinformatics tools and for various kinds of reference data. With DockerBIO, users can easily build a pipeline with tools and data registered at DockerBIO, and if necessary, users can easily register new tools or data. Built pipelines are registered in DockerBIO, which provides an efficient running environment for the pipelines registered at DockerBIO. This enables user groups to run their pipelines without expending much effort to copy and modify them.

Highlights

A huge number of bioinformatics tools or pipelines have been developed and distributed; in many cases, using them is not so easy
The running times for two DNA-seq experiments are shown in Tables 5 and 6, and for two RNA-seq experiments in Tables 7 and 8
We tested DockerBIO using DNA-seq and RNA-seq pipelines, but we think DockerBIO may be useful for a wider range of researchers active in various subfields, such as ecology, evolutionary biology, structural biology, systems biology, and so on, because DockerBIO provides a flexible environment that can accommodate a variety of bioinformatics tools

Summary

Introduction

A huge number of bioinformatics tools or pipelines have been developed and distributed; in many cases, using them is not so easy The reasons for this are: (1) they were developed and tested on specific versions of different operating systems; (2) they may require additional libraries; and (3) the tools or pipelines needed for analysis of big data (such as Generation Sequencing (NGS) experiments) requires distributed file systems and RAID settings. For these reasons, even skilled bioinformaticians have difficulty using them, and waste too much effort making environments to run them. This enables user groups to run their pipelines without expending much effort to copy and modify them

Methods

Results

Discussion

Conclusion