Thermal‐aware task assignments in high performance computing clusters

Sanjay Kulkarni,Xiao Qin,Shubbhi Taneja,Yi Zhou

doi:10.1002/cpe.4206

Abstract

SummaryCluster‐level thermal management has gained much attention over the past decade due to rising cooling costs associated with data centers. In this research, we propose and implement a static scheduler called SSched and a dynamic one named DSched. These 2 algorithms schedule jobs based on CPU and disk temperatures of a Hadoop cluster's nodes. Our schedulers rely on a monitoring mechanism to keep track of CPU and disk utilization, maintaining CPU and disk temperatures below a threshold through thermal‐aware scheduling decisions. To facilitate the design of SSched and DSched, we classify jobs into the CPU‐intensive and disk‐intensive categories. When a job arrives, SSched retrieves the utilization stats from a profiled log, estimates the thermal behavior, and places the job on NodeManager to minimize thermal impacts. Unlike SSched, DSched improves thermal efficiency of Hadoop clusters through dynamic load balancing. DSched keeps track of the coolest and hottest nodes in the cluster; tasks are migrated from hot nodes into cool ones if any hot spot is detected. To evaluate the effectiveness of our schedulers, we keep track of average CPU and disk temperatures in a node, managing an optimal outlet temperature across a cluster. We demonstrate that compared with the traditional Hadoop scheduler, SSched and DSched achieve approximately 15% savings in terms of cooling cost with little performance overhead.

Full Text