Emerging non-volatile memory (NVM) devices, such as STT-MRAM, PCM, RRAM, have been explored for embedded memory and storage applications to replace CMOS-based SRAM/DRAM and Flash devices. Recently, many of these memory devices have been utilized for new computing paradigms beyond Boolean logic and von Neumann architectures. For example, in-memory analog computing reduces data movement between computing and memory units and exploits the intrinsic parallelism in memory arrays. It finds a natural application in deep neural network (DNN) accelerators by implementing high-throughput high-efficiency multiply accumulate (MAC) operations. Here the conductance of memory devices in a crossbar array represents DNN weights and the activations are encoded in input electrical signals (e.g., pulse height or duration). The MAC operation is conducted via the Ohm’s law (multiplication between voltage and conductance) and Kirchhoff’s law (accumulate via current summation) at constant time even for very large networks. DNN has surpassed human performance in various AI applications, e.g., image classification, natural language processing, etc. While general-purpose CPU/GPU and special-purpose digital accelerators provide current and near-term DNN hardware, there are longer-term opportunities for analog DNN accelerators based on emerging memory devices to achieve significantly higher performance and energy-efficiency. At the same time, analog accelerators impose new requirements on these devices beyond traditional memory applications, e.g., analog tunability, gradual and symmetric weight modulation, high precision, etc. Memory devices with analog nature in their physical mechanisms (e.g., filament growth in RRAM) may be optimized to meet these requirements, while some abrupt and asymmetric characteristics (e.g., filament rupture) present challenges. Increasingly large neural network models have been demonstrated on these memory arrays designed as analog accelerators, but they are still orders of magnitude smaller than state-of-the-art DNN models. While analog accelerators enable massively parallel computation, they are also susceptible to unique challenges in analog devices and circuitry (e.g., device variability, circuit noise), which may degrade network performance (e.g., accuracy).To benefit from the massively parallel MAC operation in analog memory arrays, these arrays need to be large enough to efficiently map the layers in modern DNN models. Among emerging NVM devices, PCM has the advantages of maturity and the availability of large-scale arrays, but also face some challenges in device characteristics, e.g., conductance drift, asymmetry, and noise. PCM-based analog DNN accelerators have been demonstrated at advanced technology node with millions of devices and achieved iso-accuracy on increasingly large network models. These accelerators integrate highly efficient analog PCM tiles for MAC operations with advanced CMOS circuitry for auxiliary digital functions. While material/device engineering continues to be explored to improve the analog properties of PCM devices, design and operation innovations can also help to improve the performance of PCM-based DNN weights, e.g., multiple-device-per-weight design, close-loop tunning. In addition, circuit innovations are essential for analog accelerator performance. Fig. 1 shows a 14nm PCM-based DNN inference accelerator, which incorporate design techniques such as 4-PCM weight units, 2D mesh for tile-to-tile communication, pulse-duration-based coding, etc. On top of technology and design innovations, some DNN models can also be modified to be more resilient against hardware imperfection and noise. PCM-based analog accelerators have achieved iso-accuracy on large DNN models with millions of weights. This talk will discuss the progress that we have achieved on PCM-based analog DNN inference accelerators, the challenges of PCM materials and devices, and promising solutions in technology and design. Figure 1
Read full abstract