Prediction router: Yet another low latency on-chip router architecture

Hiroki Matsutani,Hideharu Amano,Michihiro Koibuchi,Tsutomu Yoshinaga

doi:10.1109/hpca.2009.4798274

Hiroki Matsutani, Hideharu Amano + Show 2 more

Open Access

https://doi.org/10.1109/hpca.2009.4798274

Copy DOI

Abstract

Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a low-latency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9% and 8.0-9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.

Full Text