Abstract

Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the development of tools and techniques to deal with the generation, storage and processing of large amounts of data. In this work we present an in-depth look at the workflow engine powering AiiDA, a widely adopted, highly flexible and database-backed informatics infrastructure with an emphasis on data reproducibility. We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness. In short, AiiDA aims to be a Swiss army knife for high-throughput computational science. As well as the architecture, we outline important API design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, ultimately enabling them to encode their scientific knowledge to the benefit of the wider scientific community.

Highlights

  • As developments in computational power have steadily and tremendously increased over the past few decades, so with them the field of computational science

  • We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness

  • As well as the architecture, we outline important application programming interface (API) design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, enabling them to encode their scientific knowledge to the benefit of the wider scientific community

Read more

Summary

INTRODUCTION

As developments in computational power have steadily and tremendously increased over the past few decades, so with them the field of computational science. A workflow can insert new steps or spawn additional logical branches while it is running, based on intermediate results produced by previously completed steps While this enables runtimemutable workflows, specific mutations are bound by the constraints of the custom static JSON markup language through which they are defined. Workflows in AiiDA are implemented directly in Python and as such have all the dynamic expressiveness of a programming language directly at their disposal, as well as full access to the entire provenance graph with the data that is already stored in the database This proves to be a very powerful mechanism to deal with, for example, the problem of error handling when running high-throughput simulations. We first describe the user interface followed by a technical description of the architecture and implementation of the engine

USER INTERFACE
Process specification
Ports and port namespaces
Inputs and outputs
Exit codes
Work functions
Work chains
Calculation jobs
ARCHITECTURE
The engine
Vertical scaling
The process
Persistence
Communication
CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.