Scanflow-K8s: Agent-based Framework for Autonomic Management and Supervision of ML Workflows in Kubernetes Clusters

Peini Liu,Ajay Dholakia,Miroslav Hodak,David Ellison,Jordi Guitart,Gusseppe Bravo-Rocca

doi:10.1109/ccgrid54584.2022.00047

Peini Liu, Ajay Dholakia + Show 4 more

Open Access

https://doi.org/10.1109/ccgrid54584.2022.00047

Copy DOI

Abstract

Machine Learning (ML) projects are currently heavily based on workflows composed of some reproducible steps and executed as containerized pipelines to build or deploy ML models efficiently because of the flexibility, portability, and fast delivery they provide to the ML life-cycle. However, deployed models need to be watched and constantly managed, supervised, and debugged to guarantee their availability, validity, and robustness in unexpected situations. Therefore, containerized ML workflows would benefit from leveraging flexible and diverse autonomic capabilities. This work presents an architecture for autonomic ML workflows with abilities for multi-layered control, based on an agent-based approach that enables autonomic management and supervision of ML workflows at the application layer and the infrastructure layer (by collaborating with the orchestrator). We redesign the Scanflow ML framework to support such multi-agent approach by using triggers, primitives, and strategies. We also implement a practical platform, so-called Scanflow-K8s, that enables autonomic ML workflows on Kubernetes clusters based on the Scanflow agents. MNIST image classification and MLPerf ImageNet classification benchmarks are used as case studies to show the capabilities of Scanflow-K8s under different scenarios. The experimental results demonstrate the feasibility and effectiveness of our proposed agent approach and the Scanflow-K8s platform for the autonomic management of ML workflows in Kubernetes clusters at multiple layers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scanflow-K8s: Agent-based Framework for Autonomic Management and Supervision of ML Workflows in Kubernetes Clusters

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: May 1, 2022
Citations: 3	License type: other-oa

Similar Papers

GradeML: Towards Holistic Performance Analysis for Machine Learning Workflows
Tim Hegeman ... Alexandru Iosup
-
Tim Hegeman, et. al.Tim Hegeman ... Alexandru Iosup
19 Apr 2021
19 Apr 2021

Cirrus
Joao Carreira ... Randy Katz
-
Joao Carreira, et. al.Joao Carreira ... Randy Katz
20 Nov 2019
20 Nov 2019

Security in Machine Learning (ML) Workflows
Dinesh Reddy Chittibala ... Srujan Reddy Jabbireddy
International Journal of Computing and Engineering | VOL. 5
Dinesh Reddy Chittibala, et. al.Dinesh Reddy Chittibala ... Srujan Reddy Jabbireddy
02 Mar 2024
International Journal of Computing and Engineering | VOL. 5

Machine Learning Operations (MLOps): Challenges and Strategies
Amandeep Singla
Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) | VOL. 2
Amandeep SinglaAmandeep Singla
14 Aug 2023
Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scanflow-K8s: Agent-based Framework for Autonomic Management and Supervision of ML Workflows in Kubernetes Clusters

Abstract

Talk to us

Similar Papers