Abstract

Meeting the Quality of Service (QoS) requirement under task consolidation on the GPU is extremely challenging. Previous work mostly relies on static task or resource scheduling and cannot handle the QoS violation during runtime. In addition, the existing work fails to exploit the computing characteristics of batch tasks, and thus wastes the opportunities to reduce power consumption while improving GPU utilization. To address the above problems, we propose a new runtime mechanism SMQoS that can dynamically adjust the resource allocation during runtime to satisfy the QoS of latency-sensitive tasks and determine the optimal resource allocation for batch tasks to improve GPU utilization and power efficiency. The experimental results show that with SMQoS, 2.27% and 7.58% more task co-runnings reach the 95% QoS target than Spart and Rollover respectively. In addition, SMQoS achieves 23.9% and 32.3% higher throughput, and reduces the power consumption by 25.7% and 10.1%, compared to Spart and Rollover respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call