Cost of Bandwidth-Optimized Sparse Mesh Layouts

Ville Leppänen,Martti Forsell,Martti Penttonen

doi:10.1007/978-3-319-21909-7_37

Abstract

The requirements of interconnection networks for shared memory chip multiprocessors (CMP) differ from those used in traditional application-specific networks on chip (NOC). This is because modern CMP cores tend to inject memory references to the network frequently (up to once per clock cycle) and the latency of references should be as low as possible. The throughput computing paradigm is a mechanism to trade the low latency requirement to high throughput in CMPs by overlapping memory references from processors with a help of multithreading. To meet the bandwidth requirements of throughput computing CMPs we have studied using d-dimensional sparse meshes and tori. Unfortunately it has turned out that either there is too much bandwidth leading to high silicon area and energy consumption of the links get longer decreasing the clock rate. In this paper we study the cost of bandwidth-optimized 2-dimensional meshes and tori for CMPs using the throughput computing paradigm. We present the layout as well as determine link length, degree of node and compare them to those of d-dimensional meshes and tori. For area and power efficiency considerations, we also give estimates on silicon area and power consumption.

Full Text