Distributed workflows with Jupyter

Iacopo Colonnelli,Marco Aldinucci,Barbara Cantalupo,Luca Padovani,Sergio Rabellino,Concetto Spampinato,Roberto Morelli,Rosario Di Carlo,Nicolò Magini,Carlo Cavazzoni

doi:10.1016/j.future.2021.10.007

Iacopo Colonnelli, Marco Aldinucci + Show 8 more

Open Access

https://doi.org/10.1016/j.future.2021.10.007

Copy DOI

Abstract

The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code’s business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations.

Highlights

Jupyter Notebook’s capability to unify imperative code and declarative metadata in a unique format puts them halfway between the two classes of tools commonly used for workflow modeling: high-level coordination languages and low-level distributed computing libraries
Jupyter Notebooks come with a feature-rich, user-friendly web interface out-of-the-box, making them far more accessible for domain experts than the SSH-based remote shells commonly exposed by High Performance Computing (HPC) facilities worldwide
The lack of support for complex workflows and the challenging integration with hybrid Cloud-HPC architectures undoubtedly hampered their adoption in production workloads

Summary

Introduction

Jupyter Notebook’s capability to unify imperative code and declarative metadata in a unique format puts them halfway between the two classes of tools commonly used for workflow modeling: high-level coordination languages and low-level distributed computing libraries. We envision Jupyter Notebook as the first representative of a new class of interfaces to foster both portability and performance, mapping the Notebook cells to workflow steps to Cloud and HPC resources. This second passage leverages the existing state-of-the-art tools for both Cloud (e.g., Kubernetes and Dockers) and HPC (e.g., Slurm, PBS, and Singularity), inheriting their performance and portability. We extend the Jupyter Notebook kernel to support parallel and distributed execution of the Notebook cells, where a cell can drive the execution of a legacy parallel code, e.g., a Fortran+MPI application. The challenge is to design an integration that supports productivity and portability without sacrificing HPC systems’ performance

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: Oct 14, 2021
Citations: 7	License type: cc-by

R Discovery Prime

R Discovery Prime

Distributed workflows with Jupyter

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

A characterization of workflow management systems for extreme-scale applications
Rafael Ferreira Da Silva ... Ewa Deelman
Future Generation Computer Systems | VOL. 75
Rafael Ferreira Da Silva, et. al.Rafael Ferreira Da Silva ... Ewa Deelman
16 Feb 2017
Future Generation Computer Systems | VOL. 75

Using prototyping to choose a bioinformatics workflow management system.
Michael Jackson ... Edward W J Wallace
PLOS Computational Biology | VOL. 17
Michael Jackson, et. al.Michael Jackson ... Edward W J Wallace
25 Feb 2021
PLOS Computational Biology | VOL. 17

Towards a Science Gateway for Bioinformatics: Experiences in the Brazilian System of High Performance Computing
Kary Ocana ... Marcelo Galheigo
-
Kary Ocana, et. al.Kary Ocana ... Marcelo Galheigo
01 May 2019
01 May 2019

CHIUW 2020 Keynote Arkouda: Chapel-Powered, Interactive Supercomputing for Data Science
William Reus
-
William ReusWilliam Reus
01 May 2020
CHIUW 2020 Keynote Arkouda: Chapel-Powered, Interactive Supercomputing for Data Science
William Reus

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed workflows with Jupyter

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems