Run-time extensibility: anything less is unsustainable

Jed Brown ,Knepley Matthew ,Smith Barry

doi:10.6084/m9.figshare.791571.v2

Abstract

Modern computational science and engineering is increasingly defined by multiphysics, multiscale simulation [5] while raising the level of abstraction to risk-aware design and decision problems. This evolution unavoidably involves deeper software stacks and the cooperation of distributed teams from multiple disciplines. Meanwhile, each application area continues to innovate and can be characterized as much by the forms of extensibility (e.g., boundary conditions, geometry, subgrid closures, analysis techniques, data sources, and inherent uncertainty/bias) as by the underlying equations. Sanitary workflow is paramount in this environment, but it is too often compromised so long as the original author’s use case is deemed acceptable. We argue that many common approaches to configuration and extensibility create artificial bottlenecks that impede science goals, and that the only sustainable approach is to defer these to run-time. We present recommendations for implementing such an approach. Compile-time configuration. The status quo for many applications, especially those written in legacy Fortran, is to perform configuration in the build system. From the perspective of higher-level analysis, the build system must then be thought of as the public application programming interface (API). In other applications, especially those written in C++ or with heavy use of conditional compilation, the choices must be made at compile time. Compute nodes often do not have access to compilers, making all build-system and compile-time decisions inaccessible to online analysis. It may be impossible for the same application to run in both configurations on different nodes or on different MPI communicators. Advanced analysis. Today’s physics models are increasingly used not just as forward models but as the target of advanced analysis techniques such as stochastic optimization, risk-aware decisions, and stability analysis. The forward model must then expose an interface for each form of modification that the analysis levels can explore. An interface requiring build-time modification shifts an unacceptable level of complexity to the analysis software and is algorithmically constraining—limiting parallelism, introducing artificial bottlenecks, and preventing some algorithms. Provenance and usability. Reproducibility and provenance are perpetual challenges of computational science that become more acute as the software stack becomes deeper and more models of greater complexity are coupled. How can we capture the state of all configuration knobs so that a computational experiment can be reproduced? Compare the complexity of a single configuration file to be read at run-time with that of a heterogeneous configuration consisting of multiple build systems, files passed from earlier stages of computation, and run-time configuration. Provenance is simplified by using each package without modification, compiled in a standard way, and controlled entirely via run-time options. For both maintenance and provenance reasons, custom components needed for a given computational experiment are better placed in version-controlled plugins rather than by modifying upstream sources. If a coherent top-level specification is to be supported in a system with build-time or source-level choices, those configuration options must be plumbed through all the intermediate levels, often resulting in another layer of “workflow” scripts and bloated, brittle high-level interfaces. “Big” data. Workflows that involve multiple executables usually pass information through the file system. It takes about one hour to read or write the contents of volatile memory to global storage on today’s top machines, assuming peak I/O bandwidth is reached. The largest allocations are on the order of tens of millions of core hours (e.g., INCITE), meaning that the entire annual compute budget

Full Text