AStechnology scaling enables the integration of billions of transistors ona chip, economiesof scale isprompting the move toward parallel chip architectures with both application-specific systems-on-a-chip (SoC) and general-purpose microprocessors leveraging multiple processing cores on a single chip for better performance at manageable design costs.As theseparallel chiparchitectures scale in size, on-chip networks are emerging as the de facto communication architecture, replacing dedicated interconnects and shared buses.At the same time, tightdesignconstraints in the formof ever-increasing chip-crossing interconnect delays and power consumption are reaching criticality. On-chip networks have todeliver good latency-throughputperformance in the faceof very tight power and area budgets. The interplay of these two trends makes on-chip network design one of the most challenging and significant design problems system designers are facing in the near term. New parallel chip architectures bring about unique delay and bandwidth requirements for on-chip networks that, in many ways, are substantially different from traditional multichip/multiboard interconnection networks found in multiprocessors and other “macro” system architectures. The exact requirements depend on the intended application, and need to be met with judicious use of precious silicon real-estate under a tight power budget capped by battery life, power delivery limits, and/or thermal characteristics. What’s more, the impact on design and verification effort as well as fault resilience must also be considered. While the computer industry within the past decade has begun introducing on-chip network architectures based on multiple buses (such as ARM’s AMBA, IBM’s CoreConnect, Sonic’s Smart Interconnect IP) and, more recently, point-to-point switch fabrics (such as CrossBow’s 2D mesh Xfabric and Fulcrum’s crossbar-centric Nexus), no standards have emerged and none are on the horizon. “Which on-chip network architecture best increases chip functionality while not negatively impacting achievable clock frequency, communication latency, bandwidth, flexibility, design/verification effort, and fault resiliency?” remains an open question. In this special section, we showcase several major research thrusts in the on-chip networks area. The selected papers can be classified by their targeted chip system: application-specific embedded SoCs versus general-purpose microprocessors. In the former, the availability of application knowledge makes it feasible and effective to tailor the on-chip network architecture toward the particular application(s) characteristics. In addition, as design time for embedded SoCs critically impacts time-to-market, streamlining and optimizing the design process of the onchip networks in such systems for flexibility, reuse, and speed is crucial. On the other hand, for general-purpose microprocessors, the break from traditional single-core architecture opens up the architectural design space, which spawns a different set of requirements (some more relaxed, others more restrictive) for on-chip networks, motivating new network architectures and studies. In both types of systems, on-chip network designers have to grapple with very tight area, power, and wire delay constraints. The first two papers target application-specific chip systems. “Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors,” by Neal K. Bambha and Shuvra S. Bhattacharyya focuses on a specific phase of the synthesis design flow of on-chip networks for application-specific SoCs: the topology mapping phase. Application knowledge is leveraged here for cooptimization of application mapping, along with topology selection, allowing for irregular topologies. The synthesis algorithm proposed is based on the metric of network hops. “NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip,” by Davide Bertozzi, Antoine Jalabert, Srinivasan Murali, Rutuparna Tamhankar, Stergios Stergio, Luca Benini, and Giovanni De Micheli proposes a design process that provides a complete synthesis flow of on-chip networks. It starts from application specifications, continues through the mapping of the application onto topologies and selection of a topology, and culminates with the synthesis of router microarchitectures and simulation of the final network design that allows for further design-space exploration. Area and power budgets given by users guide the synthesis process toward delay and reliability targets. The work demonstrates how the application-specific nature of a class of SoCs allows designers to optimize the on-chip network architecture for the application suites. In addition, automating the design process leads to the realization of flexible network architectures that can be parameterized and fine-tuned for a wide range of applications. Both papers demonstrate the IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 16, NO. 2, FEBRUARY 2005 97
Read full abstract