Abstract
Analog hardware accelerators, which perform computation within a dense memory array, have the potential to overcome the major bottlenecks faced by digital hardware for data-heavy workloads such as deep learning. Exploiting the intrinsic computational advantages of memory arrays, however, has proven to be challenging principally due to the overhead imposed by the peripheral circuitry and due to the non-ideal properties of memory devices that play the role of the synapse. We review the existing implementations of these accelerators for deep supervised learning, organizing our discussion around the different levels of the accelerator design hierarchy, with an emphasis on circuits and architecture. We explore and consolidate the various approaches that have been proposed to address the critical challenges faced by analog accelerators, for both neural network inference and training, and highlight the key design trade-offs underlying these techniques.
Highlights
Processor performance has advanced at an inexorable pace by riding on continued increases in transistor density, enabled by Dennard scaling, and, more recently, by running many processor cores in parallel
By embedding neural network computations directly inside the memory elements that store the weights, analog neuromorphic accelerators based on non-volatile memory (NVM) arrays can greatly reduce the energy and latency costs associated with data movement
To be useful for neuromorphic computing, non-volatile memory devices must meet a number of requirements that are considerably more stringent than those for storage-class memory,[69] if these devices are to be used for training
Summary
Processor performance has advanced at an inexorable pace by riding on continued increases in transistor density, enabled by Dennard scaling, and, more recently, by running many processor cores in parallel. The so-called memory wall, called the von-Neumann bottleneck, presents an opportunity for neuromorphic accelerators that can perform computations directly inside the memory array where the network’s parameters are stored Analog processing inside such an array can inherently parallelize the matrix algebra computational primitives that underlie many machine learning algorithms. Somewhat differently from recent surveys[14,15,16] of neural network accelerators based on emerging devices, we organize this review around the basic components and ideas that make crossbar-based architectures work. These ideas are partially but not entirely agnostic of the specific choice of memory device. VIII, we survey some known approaches to combat device- and array-level non-ideal effects using architectural and algorithmic techniques
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.