Abstract

During the past two years large parts of the CERN batch farm have been moved to virtual machines running on the CERN internal cloud. During this process a large fraction of the resources, which had previously been used as physical batch worker nodes, were converted into hypervisors. Due to the large spread of the per-core performance in the farm, caused by its heterogenous nature, it is necessary to have a good knowledge of the performance of the virtual machines. This information is used both for scheduling in the batch system and for accounting. While in the previous setup worker nodes were classified and benchmarked based on the purchase order number, for virtual batch worker nodes this is no longer possible; the information is now either hidden or hard to retrieve. Therefore we developed a new scheme to classify worker nodes according to their performance. The new scheme is flexible enough to be usable both for virtual and physical machines in the batch farm. With the new classification it is possible to have an estimation of the performance of worker nodes also in a very dynamic farm with worker nodes coming and going at a high rate, without the need to benchmark each new node again. An extension to public cloud resources is possible if all conditions under which the benchmark numbers have been obtained are fulfilled.

Highlights

  • Encouraged by the experiences from a small prototype for a private cloud running both direct experiment payloads and parts of the CERN batch farm at CERN [1], the CERN batch farm has been virtualised to more 90% during the past two years

  • Benchmarking individual worker nodes when they are created is not an option because such virtual machines in general don’t fill up a full hypervisor which can yield to over-optimistic benchmarking results. To work around this issue we have introduced a new scheme to classify worker nodes which allows to benchmark only a subset of nodes and generalise the results to all hosts of the same category

  • While this is fine for a traditional batch farm it would be nice to be able to use the same classification for nodes which belong to other users of the Infrastructure as a Service (IaaS) infrastructure for accounting purpose

Read more

Summary

Introduction

Encouraged by the experiences from a small prototype for a private cloud running both direct experiment payloads and parts of the CERN batch farm at CERN [1], the CERN batch farm has been virtualised to more 90% during the past two years. Benchmarking individual worker nodes when they are created is not an option because such virtual machines in general don’t fill up a full hypervisor which can yield to over-optimistic benchmarking results To work around this issue we have introduced a new scheme to classify worker nodes which allows to benchmark only a subset of nodes and generalise the results to all hosts of the same category. A caveat of this classification is that it requires detailed information about the worker node which is easy to retrieve from the worker node itself While this is fine for a traditional batch farm it would be nice to be able to use the same classification for nodes which belong to other users of the IaaS infrastructure for accounting purpose. Options to allow for this are discussed in the second part of this paper

The CERN batch farm
Classification of worker nodes
Run in 32bit mode for historical reasons
Deployment of new resources
Re-benchmarking of already deployed resources
Accounting
Batch Accounting
Outlook
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call