Intra- and Inter-Server Smart Task Scheduling for Profit and Energy Optimization of HPC Data Centers

Sayed Ashraf Mamun,Xiaohang Wang,Amit Kumar Singh,Amlan Ganguly,Geoff V Merrett,Alexander Gilday,Bashir M Al-Hashimi

doi:10.3390/jlpea10040032

Abstract

Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.

Highlights

High Performance Computing (HPC) data centers typically contain a large number of computing nodes each consisting of multiple processing cores
We evaluate the network-level performance with the consolidation algorithm in an HPC data center with network-level simulations
Instead of Bandwidth-Constrained Consolidation (BCC), if Clustered Exhaustive Search (CES) or greedy approach based consolidation (GRD) consolidation was implemented, at lower injection rates, there was no significant difference in achieved throughput for both S2S-WiDCN and fat-tree networks compared to BCC

Summary

Introduction

High Performance Computing (HPC) data centers typically contain a large number of computing nodes each consisting of multiple processing cores. In HPC systems, the scheduling of jobs is influenced by their value; typically a resource management system will attempt to maximize its profits by allocating its limited resources to the highest-value jobs in the queue This is especially true when jobs arrive at a rate higher than the rate at which the system can process and execute them. Server-centric wireless DCNs where direct wireless links are used for server-to-server communication have been designed [24,25] These wireless data center architectures can be considered as viable alternate for traditional wired architecture for HPC computing for reducing even more power consumption.

Related Work

System and Problem Definition for Scheduling Problem

HPC System

Jobs and Value Curves

Problem Definition

Objective

Adapted Multi-Armed Bandit Model

Upper Confidence Bound Algorithm

Proposed Algorithm for Confidence-Based Approach

Network Aware Server Consolidation

Traffic Pattern Model

The Network-Aware Consolidation Algorithm

Complexity Analysis

Optimizing the Inter-Consolidation Time

Experimental Results

Experimental Results for CBA Algorithm

Experimental Baselines

Profit and Energy Consumption Results at Varied Arrival Rates

Percentage of Zero-Value Jobs

Overhead Analysis

Experimental Results for BCC Algorithm

Traffic Generation and Simulation Platform for BCC

Power Consumption Analysis of BCC

Performance Analysis of BCC

Accuracy of Inter-Consolidation Time Modeling

Overall Power Saving with a Combination of BCC and CBA

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Low Power Electronics and Applications	Publication Date: Oct 14, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Intra- and Inter-Server Smart Task Scheduling for Profit and Energy Optimization of HPC Data Centers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Low Power Electronics and Applications

Lead the way for us

Similar Papers

DWPE, a new data center energy-efficiency metric bridging the gap between infrastructure and workload
Torsten Wilde ... Arndt Bode
-
Torsten Wilde, et. al.Torsten Wilde ... Arndt Bode
01 Jul 2014
01 Jul 2014

Investigative Report on Electrical Commissioning in HPC Data Centers
Joseph Prisco ... Grant Stewart
-
Joseph Prisco, et. al.Joseph Prisco ... Grant Stewart
01 Sep 2020
01 Sep 2020

Integrating cooling awareness with thermal aware workload placement for HPC data centers
Ayan Banerjee ... Sandeep K.S Gupta
Sustainable Computing: Informatics and Systems | VOL. 1
Ayan Banerjee, et. al.Ayan Banerjee ... Sandeep K.S Gupta
16 Mar 2011
Sustainable Computing: Informatics and Systems | VOL. 1

Value and Energy Aware Adaptive Resource Allocation of Soft Real-Time Jobs on Many-Core HPC Data Centers
Amit Kumar Singh ... Leandro Soares Indrusiak
-
Amit Kumar Singh, et. al.Amit Kumar Singh ... Leandro Soares Indrusiak
01 May 2016
01 May 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intra- and Inter-Server Smart Task Scheduling for Profit and Energy Optimization of HPC Data Centers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Low Power Electronics and Applications