A comparative analysis of resource requirements for parallel applications in GPGPU

Winnie Thomas,Rohin D Daruwala

doi:10.1109/tencon.2015.7372827

Abstract

The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA are further grouped as warps. Though warps within a CTA are scheduled for execution on the same core, only one warp is executed at a time due to hardware constraint. The subsequent way in which a GPU exploits parallelism is by employing multiple shader cores to execute multiple warps simultaneously. We analyse off-chip DRAM bandwidth utilization of different kinds of GPGPU applications in a system that consists of 30 shader cores. These applications are then categorized as type-I and type- II depending on their bandwidth requirements and percentage of performance with respect to maximum achievable performance. It is observed that for the baseline configuration, applications with bandwidth requirement less than 34% can afford to have more number of cores leading to improvement in the performance. Other applications with bandwidth utilization higher than 34% experience either degradation or insignificant improvement in performance with extra cores in the system, but their performance is improved significantly by scaling up the memory subsystem.

Full Text