Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

Maximilian Hanussek,Jens Krüger,Felix Bartusch

doi:10.1371/journal.pcbi.1009244

Maximilian Hanussek, Jens Krüger + Show 1 more

Open Access

https://doi.org/10.1371/journal.pcbi.1009244

Copy DOI

Journal: PLoS computational biology	Publication Date: Jul 20, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: University of Tübingen

Abstract

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community's awareness of the efficient usage of computing resources.

Highlights

Today’s sequencing technologies are becoming more and more sophisticated and produce larger and larger amounts of data on the scale of tera- and petabytes in mostly every -omics area
Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; we enable the publication of all of the content of peer review and author responses alongside final, published articles
We focus on the issues of scaling, the impact of different virtualization environments and datasets for widely used bioinformatic applications

Summary

Introduction

Today’s sequencing technologies are becoming more and more sophisticated and produce larger and larger amounts of data on the scale of tera- and petabytes in mostly every -omics area (genomics, proteomics, metabolomics). In order to analyze such huge amounts of data on a large scale, advanced algorithms and applications, developed by bioinformaticians, are becoming more and more important to answer the underlying biological questions Smart algorithms and their efficient implementation are one part. Some applications can benefit from multiple CPU cores due to their underlying algorithms or implementation, others not It would be desirable for users and resource providers to know in advance, how many resources, like CPU cores, memory and storage are reasonable to conduct computations most efficiently. The hereby addressed scalability is one factor, another factor are the more and more used virtualization technologies in particular due to the increasing offers of compute clouds Such compute clouds are usually providing access to virtual machines but not directly to the hardware, like for high performance computing (HPC) clusters. What kind of effect could that have on the used tools and applications?

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources
Felix Bartusch ... Christos A Ouzounis
-
Felix Bartusch, et. al.Felix Bartusch ... Christos A Ouzounis
20 Jul 2021
20 Jul 2021

The Importance of a Teacher in a Distance Education and the Progressive Methods of Teaching in a Virtual Learning Environment
Olga Miščenko
Coactivity: Philology, Educology | VOL. 22
Olga MiščenkoOlga Miščenko
19 Dec 2014
Coactivity: Philology, Educology | VOL. 22

An EEG-based Evaluation for Comparing the Sense of Presence between Virtual and Physical Environments
Evangelia Baka ... Kalliopi Evangelia Stavroulia
-
Evangelia Baka, et. al.Evangelia Baka ... Kalliopi Evangelia Stavroulia
11 Jun 2018
11 Jun 2018

Virtual Environments in Physical Therapy
Felix O
-
Felix OFelix O
27 Apr 2012
27 Apr 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology