Abstract

The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code’s business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations.

Highlights

  • Jupyter Notebook’s capability to unify imperative code and declarative metadata in a unique format puts them halfway between the two classes of tools commonly used for workflow modeling: high-level coordination languages and low-level distributed computing libraries

  • Jupyter Notebooks come with a feature-rich, user-friendly web interface out-of-the-box, making them far more accessible for domain experts than the SSH-based remote shells commonly exposed by High Performance Computing (HPC) facilities worldwide

  • The lack of support for complex workflows and the challenging integration with hybrid Cloud-HPC architectures undoubtedly hampered their adoption in production workloads

Read more

Summary

Introduction

Jupyter Notebook’s capability to unify imperative code and declarative metadata in a unique format puts them halfway between the two classes of tools commonly used for workflow modeling: high-level coordination languages and low-level distributed computing libraries. We envision Jupyter Notebook as the first representative of a new class of interfaces to foster both portability and performance, mapping the Notebook cells to workflow steps to Cloud and HPC resources. This second passage leverages the existing state-of-the-art tools for both Cloud (e.g., Kubernetes and Dockers) and HPC (e.g., Slurm, PBS, and Singularity), inheriting their performance and portability. We extend the Jupyter Notebook kernel to support parallel and distributed execution of the Notebook cells, where a cell can drive the execution of a legacy parallel code, e.g., a Fortran+MPI application. The challenge is to design an integration that supports productivity and portability without sacrificing HPC systems’ performance

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.