Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

Michael Hildreth,Scott Hampton,Paul Brenner,Cody Kankel,Tibor Simko,Irena Johnson,Kenyi Paolo Hurtado Anampa

doi:10.1051/epjconf/202024509011

Michael Hildreth, Scott Hampton + Show 5 more

Open Access

https://doi.org/10.1051/epjconf/202024509011

Copy DOI

Abstract

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.

Highlights

Introduction and MotivationThe National Science Foundation (NSF) has made significant investments in major multiuser research facilities (MMURFs), which are the foundation for a robust data-intensive science program
Extracting scientific results from these facilities involves the comparison of “real” data collected from the experiments with “synthetic” data produced from computationally-intensive simulations. This is the modus operandi of MMURFs such as the Large Hadron Collider (LHC), IceCube Neutrino Observatory, and the Laser Interferometer Gravitational Wave Observatory (LIGO)
In recent years there has been a tremendous amount of interest in leveraging machine learning (ML) and artificial intelligence (AI) techniques to enhance the analysis of data from these facilities

Summary

Introduction and Motivation

The National Science Foundation (NSF) has made significant investments in major multiuser research facilities (MMURFs), which are the foundation for a robust data-intensive science program. In recent years there has been a tremendous amount of interest in leveraging machine learning (ML) and artificial intelligence (AI) techniques to enhance the analysis of data from these facilities While these facilities have highly-engineered systems for data acquisition and there are corresponding systems for generating simulated data, several data analysis tasks are often shepherded manually or through ad hoc scripts that are not well maintained. NSF has supported the development of a new class of data analysis techniques that leverage ML to improve the discovery potential of MMURFs. significant is the emergence of a class of likelihood-free inference (LFI) techniques that are needed when the predictions for the data are implicitly defined by simulation, which often leads to an intractable likelihood function. The resulting applications will run on large (e.g. cloud/HPC) resources to analyze LHC data

Likelihood-Free Inference

The REANA System

SCAILFIN

Initial Development

Deployment

Workflows

Future Evolution

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ web of conferences	Publication Date: Jan 1, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences

Lead the way for us

Similar Papers

Abstracting container technologies and transfer mechanisms in the Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project
Kenyi Hurtado Anampa ... W Kamleh
EPJ web of conferences | VOL. 245
Kenyi Hurtado Anampa, et. al.Kenyi Hurtado Anampa ... W Kamleh
01 Jan 2020
EPJ web of conferences | VOL. 245

Neuroscience Gateway � Cyberinfrastructure Providing Supercomputing Resources for Large Scale Computational Neuroscience Research
Majumdar Amitava ... Yoshimoto Kenneth
Frontiers in Neuroinformatics | VOL. 10
Majumdar Amitava, et. al.Majumdar Amitava ... Yoshimoto Kenneth
01 Jan 2015
Frontiers in Neuroinformatics | VOL. 10

Network slicing to improve multicasting in HPC clusters
Izzat Alsmadi ... Abdallah Khreishah
Cluster Computing | VOL. 21
Izzat Alsmadi, et. al.Izzat Alsmadi ... Abdallah Khreishah
31 Jan 2018
Cluster Computing | VOL. 21

Automating Job Monitoring System for an Ecosystem of High Performance Computing
Kajornsak Piyoungkorn ... Phithak Thaenkaew
-
Kajornsak Piyoungkorn, et. al.Kajornsak Piyoungkorn ... Phithak Thaenkaew
07 Nov 2017
07 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences