Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting a huge amount of architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput and low-latency communication. The switches are the basic building blocks of such interconnection networks and their design critically affects the performance of the whole system. So far, innovation in switch design relied mostly to architecture-level solutions that took for granted the characteristics of the main building blocks of the switch, such as the buffers, the routing logic, the arbiters, the crossbar's multiplexers, and without any further modifications, tried to reorganize them in a more efficient way. Although such pure high-level design has produced highly efficient switches, the question of how much better the switch would be if better building blocks were available remains to be investigated. In this paper, we try to partially answer this question by explicitly targeting the design from scratch of new soft macros that can handle concurrently arbitration and multiplexing and can be parameterized with the number of inputs, the data width, and the priority selection policy. With the proposed macros, switch allocation, which employs either standard round robin or more sophisticated arbitration policies with significant network-throughput benefits, and switch traversal, can be performed simultaneously in the same cycle, while still offering energy-delay efficient implementations.
Read full abstract