Abstract

Modern General Purpose Graphic Processing Units (GPGPU) offer high throughput for parallel applications with their hundreds of integrated cores. However, there are applications that experience performance saturation and even degradation with increasing number of cores. At present the scheduler in the GPU hardware allocates all the available resources to maximize their utilization. We observed that applications have preference towards specific set of resources. The utilization of other redundant resources can reduce the throughput of the applications. To overcome this problem, in this paper we first classify the applications into two types; type-I that dominantly require processing cores and type-II that rely on the performance of the memory-system. We propose an Application aware Scalable Architecture (ApSA) for GPGPU based on classified applications which performs run-time tailoring of the GPU resources to present an optimal set of resources to the running application. The results are analyzed and compared in terms of instructions per cycle, bandwidth utilization and branch divergence. We found that if the application is identified to be of type-I with the proposed technique the average profiling overhead is 1.6%. Type-II applications experience average profiling overhead of 1.15%. The average power saved by clock-gating redundant resources in the case of type-II applications is 20.08%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call