Abstract

The graphics processing unit (GPU) is the most promising candidate platform for achieving faster improvements in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architectural challenges. In this paper, we focus on improving performance by better hiding long waiting time for transferring data from the slow global memory. Furthermore, we show that the proposed method can reduce power and energy. Reduction in access time to off-chip data has a noticeable role in reducing waiting time and the percentage of unutilized elements. Also, using processing elements in a suitable manner to prefetch data during stall times bridges the memory gap in an energy-efficient manner, and consequently leads to less power and energy consumption. Simulation results show that we can potentially improve instruction per cycle (IPC) up to 24.76 %. Moreover, results show that power, energy and energy efficiency improve by up to 22.47, 24.72 and 36.01 %, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call