Abstract

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both high-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers, and researchers in the area of machine learning.

Highlights

  • The era of artificial intelligence and big data is introducing new workloads which operate on huge datasets

  • Their dataflow is not optimized to exploit the intrinsic features of a 3Dstacked architecture, and no computations are performed within the DRAM layers, leaving opportunities to further improve the design with PIM capabilities

  • Similar to PRIME and ISAAC, PipeLayer requires a large amount of Resistive Random Access Memory (ReRAM) crossbars due to the pipelined execution, and the throughput of the training may be limited by the slow writing latency and complex re-programming of ReRAM crossbars

Read more

Summary

Introduction

The era of artificial intelligence and big data is introducing new workloads which operate on huge datasets. DNN models tend to be huge, their size ranges from tens to hundreds of megabytes, or even gigabytes, and computing the weighted sum of inputs for each neuron of a given layer requires a large number of data movements between the different levels of the memory hierarchy and the processing units. In order to be able to exploit the high level of parallelism of the DNN layers, a high memory bandwidth is required to provide the necessary data to feed multiple processing units This enormous traffic in the memory hierarchy represents a great portion of energy consumption for any given device and, together with the high memory storage and memory bandwidth requirements, heavily constrains the efficiency of the compute-centric architectures.

Background
Conventional Memory Technologies
Near-Data Processing Architectures
Commodity Memory Based NDP Architectures
Discussion
Neural Cache
Bit Prudent
Coalesce
TETRIS
ReRAM Based Architectures
F F Subarray
Pipelayer
CASCADE
RAPIDNN
Findings
Conclusions and Future Perspectives
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call