Interoperable and scalable data analysis with microservices: applications in metabolomics.

Payam Emami Khoonsari,Kenneth Haug,Matteo Carone,Christoph Steinbeck,Ola Spjuth,Anders Larsson,Rico Rueedi,Michael Van Vliet,Thomas Hankemeier,Kim Kultima,Pedro De Atauri,Gianluigi Zanetti,Carles Foguet,Philippe Rocca-Serra,Stephanie Herman,Noureddin Sadawi,Marta Cascante,Joachim Burman,Namrata Kale,Sijin He,Christoph Ruttkies,Daniel Schober,Steffen Neumann,Kristian Peters,David Johnson,Sven Bergmann,Luca Pireddu,Pierrick Roger,Marco Capuccini,Vitaly A Selivanov ,Reza M Salek ,Alejandra N González-Beltrán ,Etienne Thévenot ,Pablo Moreno-Ger ,Susanna‐Assunta Sansone

doi:10.1093/bioinformatics/btz160

Abstract

MotivationDeveloping a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.ResultsWe developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.Availability and implementationThe PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

Biology is becoming data-intensive as high throughput experiments in genomics or metabolomics are rapidly generating datasets of massive volume and complexity (Marx, 2013; Schadt et al, 2010), posing a fundamental challenge on large scale data analytics.Currently, the most common large-scale computational infrastructures in science are shared High-Performance Computing (HPC) systems
We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study
We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for massspectrometry-based metabolomics including preprocessing, statistics and identification

Summary

Introduction

The most common large-scale computational infrastructures in science are shared High-Performance Computing (HPC) systems Such systems are usually designed primarily to support computationally intensive batch jobs—e.g. for the simulation of physical processes—and are managed by specialized system administrators. Containers are more compact, and since they share the same operating system kernel, they are fast to start and stop and incur little overhead in execution These traits make them an ideal solution to implement lightweight microservices, a software engineering methodology in which complex applications are divided into a collection of smaller, loosely coupled components that communicate over a network (Newman, 2015). Another important feature of microservices is that they have a technology-agnostic communication protocol, and can serve as building blocks that can be combined and reused in multiple ways (da Veiga Leprevost et al, 2017)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Mar 9, 2019
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Interoperable and scalable data analysis with microservices: applications in metabolomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Using a Virtual Research Environment to support new models of collaborative and participative research in Scottish education
Alastair Wilson ... Christine Sinclair
Technology, Pedagogy and Education | VOL. 16
Alastair Wilson, et. al.Alastair Wilson ... Christine Sinclair
01 Oct 2007
Technology, Pedagogy and Education | VOL. 16

Improving Service Management for Federated Resources to Support Virtual Research Environments
Ioannis Liabotis ... Sonja Filiposka
Scalable Computing: Practice and Experience | VOL. 19
Ioannis Liabotis, et. al.Ioannis Liabotis ... Sonja Filiposka
10 May 2018
Scalable Computing: Practice and Experience | VOL. 19

Global Food-source Identifier (GFI): Collaborative virtual research environment and shared data catalogue for the foodborne outbreak investigation international community
Ana Sofia Ribeiro Duarte ... Håkan Vigre
Food Control | VOL. 121
Ana Sofia Ribeiro Duarte, et. al.Ana Sofia Ribeiro Duarte ... Håkan Vigre
15 Sep 2020
Food Control | VOL. 121

Success Criteria for the Development and Sustainable Operation of Virtual Research Environments
Stefan Buddenbohm ... Heike Neuroth
D-Lib Magazine | VOL. 21
Stefan Buddenbohm, et. al.Stefan Buddenbohm ... Heike Neuroth
01 Sep 2015
D-Lib Magazine | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interoperable and scalable data analysis with microservices: applications in metabolomics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics