Abstract

Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.

Highlights

  • Genomic pipelines usually rely on a combination of several pieces of third party research software

  • We have assessed the impact of Docker containers technology on the performance of genomic pipelines, showing that container “virtualization” has a negligible overhead on pipeline performance when it is composed of medium/long running tasks, which is the most common scenario in computational genomic pipelines

  • Docker images, though smaller than an equivalent virtual machine image, need some time in order to be downloaded from a remote repository

Read more

Summary

Introduction

Genomic pipelines usually rely on a combination of several pieces of third party research software These applications tend to be academic prototypes that are often difficult to install, configure and deploy. Machine virtualization technology was proposed as an answer to this issue (Howe, 2012; Gent, 2013). Even to add or change just a single file, an overall copy of the virtual machine needs to be assembled and deployed It is difficult, if not impossible, to reuse pieces of software or data inside a virtual machine. Their content tends to be opaque i.e., not systematically described or accessible with a standard tool/protocol (Hinsen, 2014)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.