BitStream

Tianli Zhao,Jian Cheng,Xiangyu He,Jing Hu

doi:10.1145/3240508.3240673

Abstract

Convolutional Neural Networks (CNN) have been widely used in many multimedia applications such as image classification, speech recognition, natural language processing and so on. Nevertheless, the high performance of deep CNN models also comes with high consumption of computation and storage resources, making it difficult to run CNN models in real time applications on mobile devices, where computational ability, memory resource and power are largely constrained. Binary network is a recently proposed tech- nique to reduce the computational and memory complexity of CNN, in which the expensive floating-point operations can be replaced by much cheaper bit-wise operations. Despite its obvious advantages, only few works explored the efficient implementation of Binary Neural Networks (BNN). In this work, we present a general architecture for efficient binary convolution referred to as BitStream with the latest computation flow for BNNs instead of the traditional row-major im2col based one. We mainly optimize memory access during computation of BNNs, the proposed calculation flow is cache friendly as well as can largely reduce memory overhead of BNNs, decidedly leading to its memory efficiency and further computational efficiency. Extensive evaluations on various networks demon- strate the efficiency of the proposed method. For instance, memory consumption of BitStream is reduced by 18-32× than original networks and 3× than existing implementations of BNNs during inference. Moreover, our implemented binary Alexnet can achieve 8.07× and 2.84× speedup over floating point precision model and conventional implementations of BNNs on 8 × Cortex A53 CPU s. With 4 × Intel CORE i7 6700 CPUs, the binary vgg-like convolutional network on CIFAR-10 runs even 1.69× faster than floating point precision version on TitanX GPU.

Full Text