CNNFlow: Memory-driven Data Flow Optimization for Convolutional Neural Networks

Qi Nie,Sharad Malik

doi:10.1145/3577017

Abstract

Convolution Neural Networks (CNNs) are widely deployed in computer vision applications. The datasets are large, and the data reuse across different parts is heavily interleaved. Given that memory access (SRAM and especially DRAM) is more expensive in both performance and energy than computation, maximizing data reuse to reduce data movement across the memory hierarchy is critical to improving execution efficiency. This is even more important for the common use case of CNNs on mobile devices where computing/memory resources are limited. We propose CNNFlow, a memory-driven dataflow optimization framework to automatically schedule CNN computation on a given CNN architecture to maximize data reuse at each level of the memory hierarchy. We provide a mathematical calculation for data reuses in terms of parameters including loop ordering, blocking, and memory-bank allocation for tensors in CNN. We then present a series of techniques that help prune the large search space and reduce the cost of the exploration. This provides, for the first time, an exact and practical search algorithm for optimal solutions to minimize memory access cost for CNN. The efficacy is demonstrated for two widely used CNN algorithms: AlexNet and VGG16 with 5 and 13 convolution layers, respectively. CNNFlow finds the optimal solution for each layer within tens of minutes of compute time. Its solution requires about 20% fewer DRAM accesses and 40%–80% fewer SRAM accesses compared to state-of-the-art algorithms in the literature.

Full Text