Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture

Yi Wang,Mingxu Zhang,Jing Yang

doi:10.1145/3061639.3062242

Abstract

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although recent development in processing-in-memory (PIM) architecture seeks to minimize data movement by computing the data at the dedicated nonvolatile device, how to jointly explore the computation capability of PIM and utilize the highly parallel property of neural network remains a critical issue. This paper presents Para-CONV, that exploits parallelism for deterministic convolutional connections in PIM architecture. Para-CONV achieves data-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to minimize data movement and data fetching from off-PE DRAM for inter-PE communications. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. Para-CONV is evaluated through a set of benchmarks from both real-life CNN applications and synthetic task graphs. The experimental results show that Para-CONV can significantly improve the throughput and reduce data movement compared with the baseline scheme.

Full Text