RAPID: Memory-Aware NoC for Latency Optimized GPGPU Architectures

Venkata Yaswanth Raparti,Sudeep Pasricha

doi:10.1109/tmscs.2018.2871094

Abstract

The growing parallelism in most of today's applications has led to an increased demand for parallel computing in processors. General Purpose Graphics Processing Units (GPGPUs) have been used extensively to support highly parallel applications in recent years. Such GPGPUs generate huge volumes of network traffic between memory controllers (MCs) and shader cores. As a result, the network-on-chip (NoC) fabric can become a performance bottleneck, especially for memory intensive applications running on GPGPUs. Traditional mesh-based NoC topologies are not suitable for GPGPUs as they possess high network latency that leads to congestion at MCs and an increase in application execution time. In this article, we propose a novel memory-aware NoC that has two (request and reply) planes tailored to exploit the traffic characteristics in GPGPUs. The request layer consists of low power, and low latency routers that are optimized for the many-to-few traffic pattern. In the reply layer, flits are sent on fast overlay circuits to reach their destinations in just three cycles (at 1 GHz). In addition, as traditional memory controllers are not aware of the application memory intensity that leads to higher waiting time for applications on the shader cores, we propose an enhanced memory controller that prioritizes burst packets to improve application performance on GPGPUs. Experimental results indicate that our framework yields an improvement of ${\mathrm{4}}-{\mathrm{10}}\times$ in NoC latency, up to 63 percent in execution time, and up to 4× in total energy consumption compared to the state-of-the-art.

Full Text