Abstract

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.

Highlights

  • Introduction and MotivationThe National Science Foundation (NSF) has made significant investments in major multiuser research facilities (MMURFs), which are the foundation for a robust data-intensive science program

  • Extracting scientific results from these facilities involves the comparison of “real” data collected from the experiments with “synthetic” data produced from computationally-intensive simulations. This is the modus operandi of MMURFs such as the Large Hadron Collider (LHC), IceCube Neutrino Observatory, and the Laser Interferometer Gravitational Wave Observatory (LIGO)

  • In recent years there has been a tremendous amount of interest in leveraging machine learning (ML) and artificial intelligence (AI) techniques to enhance the analysis of data from these facilities

Read more

Summary

Introduction and Motivation

The National Science Foundation (NSF) has made significant investments in major multiuser research facilities (MMURFs), which are the foundation for a robust data-intensive science program. In recent years there has been a tremendous amount of interest in leveraging machine learning (ML) and artificial intelligence (AI) techniques to enhance the analysis of data from these facilities While these facilities have highly-engineered systems for data acquisition and there are corresponding systems for generating simulated data, several data analysis tasks are often shepherded manually or through ad hoc scripts that are not well maintained. NSF has supported the development of a new class of data analysis techniques that leverage ML to improve the discovery potential of MMURFs. significant is the emergence of a class of likelihood-free inference (LFI) techniques that are needed when the predictions for the data are implicitly defined by simulation, which often leads to an intractable likelihood function. The resulting applications will run on large (e.g. cloud/HPC) resources to analyze LHC data

Likelihood-Free Inference
The REANA System
SCAILFIN
Initial Development
Deployment
Workflows
Future Evolution
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call