Experiences with the ALICE Mesos infrastructure

D Berzano,C Grigoraş,K Napoli,G Eulisse

doi:10.1088/1742-6596/898/8/082043

Abstract

Apache Mesos is a resource management system for large data centres, initially developed by UC Berkeley, and now maintained under the Apache Foundation umbrella. It is widely used in the industry by companies like Apple, Twitter, and Airbnb and it is known to scale to 10 000s of nodes. Together with other tools of its ecosystem, such as Mesosphere Marathon or Metronome, it provides an end-to-end solution for datacenter operations and a unified way to exploit large distributed systems. We present the experience of the ALICE Experiment Offline & Computing in deploying and using in production the Apache Mesos ecosystem for a variety of tasks on a small 500 cores cluster, using hybrid OpenStack and bare metal resources. We will initially introduce the architecture of our setup and its operation, we will then describe the tasks which are performed by it, including release building and QA, release validation, and simple Monte Carlo production. We will show how we developed Mesos enabled components (called “Mesos Frameworks”) to carry out ALICE specific needs. In particular, we will illustrate our effort to integrate Work Queue, a lightweight batch processing engine developed by University of Notre Dame, which ALICE uses to orchestrate release validation. Finally, we will give an outlook on how to use Mesos as resource manager for DDS, a software deployment system developed by GSI which will be the foundation of the system deployment for ALICE next generation Online-Offline (O2).

Highlights

We present the experience of the ALICE Experiment Offline & Computing in deploying and using in production the Apache Mesos ecosystem for a variety of tasks on a small 500 cores cluster, using hybrid OpenStack and bare metal resources
We will give an outlook on how to use Mesos as resource manager for DDS, a software deployment system developed by GSI which will be the foundation of the system deployment for ALICE generation Online-Offline (O2)
The main ones used for software offline purposes are the Release Validation cluster [1] and the PROOF-based Virtual Analysis Facility [2]: both clusters are virtual and run on resources supplied by the CERN OpenStack instance [3]

Summary

Introduction

ALICE Mesos cluster setup at CERN Mesos has a two-level architecture: a certain number of masters control the infrastructure and the registered frameworks, while a large number of agents, one per worker node, are in charge of deploying, running and garbage-collecting tasks. All three approaches with multiple frameworks can work together on a single set of Mesos resources, as Mesos was designed to orchestrate different use cases at the same time.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Experiences with the ALICE Mesos infrastructure

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Oct 1, 2017
License type: cc-by

Similar Papers

Resource management in the Cronus distributed operating system
R Schantz ... K Schroder
ACM SIGCOMM Computer Communication Review | VOL. 17
R Schantz, et. al.R Schantz ... K Schroder
01 Aug 1987
ACM SIGCOMM Computer Communication Review | VOL. 17

Resource management in the Cronus distributed operating system
R. Schantz ... K. Schroder
-
R. Schantz, et. al.R. Schantz ... K. Schroder
01 Aug 1987
01 Aug 1987

Resource management in large distributed systems
Andrzej Goscinski ... Mirion Bearman
ACM SIGOPS Operating Systems Review | VOL. 24
Andrzej Goscinski, et. al.Andrzej Goscinski ... Mirion Bearman
01 Sep 1990
ACM SIGOPS Operating Systems Review | VOL. 24

3-Hierarchical resource management model on web grid service architecture
Eun-Ha Song ... Young-Sik Jeong
The Journal of Supercomputing | VOL. 46
Eun-Ha Song, et. al.Eun-Ha Song ... Young-Sik Jeong
05 Jun 2008
The Journal of Supercomputing | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experiences with the ALICE Mesos infrastructure

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series