A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

R Du,J Zou,J Shi,Z Sun,X Jiang,G Chen,A Forti,M Litmaath,L Betev,P Hristov,O Smirnova

doi:10.1051/epjconf/201921408004

Abstract

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.

Highlights

IntroductionThe resource utilization ratio of the HTCondor cluster has reached more than 90% , which means it has reached the bottleneck of resource provision for now
There are two local computing clusters in the Institute of High Energy Physics(IHEP), one is a HTCondor cluster, the other is a Slurm cluster
Most jobs running on the HTCondor cluster are single-core jobs, while parallel and multi-core jobs are running on the Slurm cluster

Summary

Introduction

The resource utilization ratio of the HTCondor cluster has reached more than 90% , which means it has reached the bottleneck of resource provision for now. The workload of the Slurm cluster is relatively not heavy, and the resource utilization ration is 50% on average. If HTCondor jobs could be migrated and run on Slurm cluster, users of the HTCondor cluster could have more resources to run their jobs, and resource utilization ratio of the Slurm cluster would be increased at the same time. To testify that workload integration between HTCondor and Slurm clusters is feasible, section 2 lists and compares related and similar works.

Related works

Job migration

Overlap

Flocking

HTCondor-C

The issue of large job quantity

The issue of resource sharing

The issue of system environment

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Openlava: An open source scheduler for high performance computing
Pranav Joshi ... Muda Rajesh Babu
-
Pranav Joshi, et. al.Pranav Joshi ... Muda Rajesh Babu
01 May 2016
01 May 2016

High-Performance Cluster Computing. Volume 1: Architecutes and Systems. Volume 2: Programming and Applications

Scalable Computing Practice and Experience | VOL. 2

01 Jan 1998
Scalable Computing Practice and Experience | VOL. 2

Towards green computing using diskless high performance clusters
...
-
, et. al. ...
24 Oct 2011
24 Oct 2011

MapReduce over Lustre: Can RDMA-Based Approach Benefit?
Xiaoyi Lu ... Md Wasi-Ur Rahman
-
Xiaoyi Lu, et. al.Xiaoyi Lu ... Md Wasi-Ur Rahman
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences