Abstract

Although GPUs have been indispensable in data centers, meeting the Quality of Service (QoS) under task consolidation on GPU is extremely challenging. Previous works mostly rely on the static task or resource scheduling and cannot handle the QoS violation during runtime. In addition, existing works fail to exploit the computing characteristics of batch tasks, and thus waste the opportunities to reduce power consumption while improving GPU utilization. To address the above problems, we propose a new runtime mechanism SMQoS that can dynamically adjust the resource allocation during runtime to meet the QoS of latency-sensitive (LS) tasks and determine the optimal resource allocation for batch tasks to improve GPU utilization and power efficiency. We implement the proposed mechanism on both simulator (SMQoS) and real GPU hardware (RH-SMQoS). The experimental results show that both SMQoS and RH-SMQoS can achieve better QoS for LS tasks and higher throughput for batch tasks compared to the state-of-the-art works. With hardware extension, the SMQoS can further reduce the power consumption by power gating idle computing resources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call