Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology

Randy Morris,Avinash Karanth Kodi,Evan Jolley

doi:10.1109/tpds.2013.26

Abstract

As the number of cores increases exponentially on a single chip, the design and integration of both the on-chip network facilitating intercore communication, and the cache coherence protocol for enabling shared memory programming have become critical for improved energy-efficiency and overall chip performance. With traditional metal interconnects facing stringent energy constraints, researchers are currently pursuing disruptive solutions such as nanophotonics for improved energy-efficiency. Cache coherence in multicores can be enforced effectively by snoopy protocols; however, broadcasting every cache miss can limit the scalability while consuming excess energy. In this paper, we propose PULSE, a nanophotonic broadcast tree-based network for snoopy cache coherent multicores. To limit the energy-penalty from broadcasting (and thereby splitting) optical signals, we direct the optical signal from the external laser such that only the subset of requesters can receive the optical signal. Furthermore, as cache blocks are shared by a few cores, we propose a multicast version of PULSE called multi-PULSE that predicts the sharers' for each L2 miss and morphing the broadcast to a multicast network. We evaluate the energy and performance using CACTI and SIMICS on 16-core and 64-core versions of PULSE and multi-PULSE for Splash-2, PARSEC, and SPEC CPU2006 benchmarks and compare to electrical networks, optical networks, and another cache filtering techniques. Our results indicate that PULSE outperforms competitive electrical/optical networks by 60 percent in terms of execution time, and multi-PULSE reduces average energy from 10 to 80 percent even with a few mispredictions.

Full Text