Neuromorphic systems are typically designed as a tile-based architecture where inter-tile data communication is facilitated using a shared global interconnect. Congestion on this interconnect can increase both interconnect energy, which increases the total energy consumption of the hardware and latency, which impacts the performance e.g., accuracy of the application that is being executed on the hardware. Mesh-based Network-on-Chip (NoC) that is used in most hardware prototypes is not the optimal interconnect solution for neuromorphic systems. This is because of the following two reasons. First, power consumption and average latency of a NoC increases exponentially with the number of tiles in the hardware. Second, a NoC cannot exploit an application’s data communication pattern efficiently. Once designed for a target hardware, the bandwidth on each NoC link stays the same, independent of the volume of data traffic between different tile pairs of the NoC. In other words, a NoC cannot be customized at a finer granularity based on an individual application running on the hardware. We show that these NoC limitations prevent opportunities to further improve energy and latency of a neuromorphic hardware. To address these limitations, we propose Dynamic Segmented Bus (SB) interconnect for neuromorphic systems. Here, a bus lane is partitioned into segments with each segment connecting a few tiles. Connection of tiles to segments and those between segments are bridged using our novel three-way segmentation switches that are programmed using the software before admitting an application to the hardware. We partition an application by analyzing its workload and place partitions intelligently onto segments. This exploits application characteristics to use the segments without any routing collisions while exploiting the latency and energy savings in the design-time mapping phase. At a high-level, our mapping algorithm places tiles that communicate the most on shorter segments utilizing fewer number of switches, thereby reducing network congestion. It can adjust the bandwidth by controlling the number of segments connected to a destination tile. At run time, our controller dynamically executes the predefined routing paths without requiring any additionally routing decisions, unlike a NoC. This allows us to improve both energy and latency. Using parallel segmented busses, our proposed interconnect architecture can support a large number of tiles without significantly increasing the design cost, energy, and latency. Simulation results show that compared to the most widely-used mesh-based NoC design, our interconnect architecture, which we call NeuSB, reduces the switch area by 20x, average interconnect energy by 6.2x, and latency by 23%.
Read full abstract