Abstract

The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA are further grouped as warps. Though warps within a CTA are scheduled for execution on the same core, only one warp is executed at a time due to hardware constraint. The subsequent way in which a GPU exploits parallelism is by employing multiple shader cores to execute multiple warps simultaneously. We analyse off-chip DRAM bandwidth utilization of different kinds of GPGPU applications in a system that consists of 30 shader cores. These applications are then categorized as type-I and type- II depending on their bandwidth requirements and percentage of performance with respect to maximum achievable performance. It is observed that for the baseline configuration, applications with bandwidth requirement less than 34% can afford to have more number of cores leading to improvement in the performance. Other applications with bandwidth utilization higher than 34% experience either degradation or insignificant improvement in performance with extra cores in the system, but their performance is improved significantly by scaling up the memory subsystem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call