Abstract

As the number of on-chip cores increases, scalable on-chip topologies such as meshes inevitably add multiple hops to each network traversal. The best practice today is to design one-cycle routers, such that the low-load network latency between a source and destination is equal to the number of routers and links (that is, twice the hops) between them. Designers of operating systems, compilers, and cache coherence protocols often try to limit communication to within a few hops because on-chip latency is critical for their scalability. In this article, the authors propose an on-chip network called Smart (Single-cycle Multihop Asynchronous Repeated Traversal) that aims to present a single-cycle datapath all the way from the source to the destination. They do not add any additional fast physical express links in the datapath; instead, they drive the shared crossbars and links asynchronously up to multiple hops within a single cycle. They designed a router and link microarchitecture to achieve such a traversal, and a flow-control technique to arbitrate and set up multihop paths within a cycle. A place-and-route design at 45 nm achieves 11 hops within a 1-GHz cycle for paths without turns (9 hops for paths with turns). The authors observe 5 to 8 times reduction in low-load latencies across synthetic traffic patterns on an 8×8 chip multiprocessor, compared to a baseline one-cycle router network. Full-system simulations with Splash-2 and Parsec benchmarks demonstrate 27 and 52 percent reduction in runtime for private and shared level-2 designs, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call