Domain wall memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with very high density and energy efficiency, and has been explored for secondary storage and off-chip memory. In this work, we explore the use of DWM within the on-chip cache hierarchy of general purpose computing platforms. Our work is motivated by the fact that DWMs enable much higher density compared to SRAM, DRAM, and other spin-based memory technologies such as STT-MRAM. However, DWMs also pose the unique challenge of serial access to the bits stored in a cell, leading to large and variable access latencies. In addition, DWMs share the inherent write inefficiency of other spin-based memories. We propose TapeCache, a DWM-based cache design that employs device, circuit, and architectural techniques to address these challenges. At the device level, we perform write optimization by employing a new write mechanism based on domain wall shifts to achieve fast, energy-efficient writes in DWM. At the circuit level, we propose different DWM bit-cell designs that are tailored to the distinct architectural requirements of different levels in the cache hierarchy. At the architecture level, we propose a new cache organization and suitable management policies that mitigate the performance penalty arising from serial access to bits in a DWM cell. We show that the holistic device-circuit-architecture co-design enables all the levels in the cache hierarchy to be realized using DWM and benefit from its improved density. Over a wide range of SPEC CPU 2006 benchmarks, TapeCache achieves an average energy improvement of 7.5 $\times$ , with virtually identical performance and 7.8 $\times$ improvement in area, compared to an iso-capacity SRAM cache. Compared to an iso-capacity STT-MRAM cache, TapeCache obtains 3.1 $\times$ improvement in area and 2 $\times$ average energy savings along with 1.1 percent performance improvement.
Read full abstract