Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory Architecture

Yi Wang,Weixuan Chen,Jing Yang,Tao Li

doi:10.1109/tpds.2018.2791440

Abstract

Convolutional neural networks (CNNs) have been successfully applied in artificial intelligent systems to perform sensory processing, sequence learning, and image processing. In contrast to conventional computing-centric applications, CNNs are known to be both computationally and memory intensive. The computational and memory resources of CNN applications are mixed together in the network weights. This incurs a significant amount of data movement, especially for high-dimensional convolutions. The emerging Processing-in-Memory (PIM) alleviates this memory bottleneck by integrating both processing elements and memory into a 3D-stacked architecture. Although this architecture can offer fast near-data processing to reduce data movement, memory is still a limiting factor of the entire system. We observe that an unsolved key challenge is how to efficiently allocate convolutions to 3D-stacked PIM to combine the advantages of both neural and computational processing. This paper presents MemoNet , a memo ry-efficient data allocation strategy for convolutional neural net works on 3D PIM architecture. MemoNet offers fine-grained parallelism that can fully exploit the computational power of PIM architecture. The objective is to capture the characteristics of neural network applications and perfectly match the underlining hardware resources provided by PIM, resulting in a hardware-independent design to transparently allocate data. We formulate the target problem as a dynamic programming model and present an optimal solution. To demonstrate the viability of the proposed MemoNet, we conduct a set of experiments using a variety of realistic convolutional neural network applications. The extensive evaluations show that, MemoNet can significantly improve the performance and the cache utilization compared to representative schemes.

Full Text