TF-Net

Jiecao Yu,Scott Mahlke,Andrew Lukefahr,Reetuparna Das

doi:10.1145/3358189

Abstract

Deep Neural Networks (DNNs) have become an essential component of various applications. While today’s DNNs are mainly restricted to cloud services, network connectivity, energy, and data privacy problems make it important to support efficient DNN computation on low-cost, low-power processors like microcontrollers. However, due to the constrained computation resources, it is challenging to execute large DNN models on microcontrollers. Using sub-byte low-precision input activations and weights is a typical method to reduce DNN computation. But on byte-addressable microcontrollers, the sub-byte computation is not well supported. The sub-byte inputs and weights need to be unpacked from bitstreams before computation, which incurs significant computation and energy overhead. In this paper, we propose the TF-Net pipeline to efficiently deploy sub-byte DNNs on microcontrollers. While TF-Net allows for a range of weight and input precision, we find Ternary weights and Four-bit inputs provide the optimal balance between model accuracy, computation performance, and energy efficiency. TF-Net first includes a training framework for sub-byte low-precision DNN models. Two algorithms are then introduced to accelerate the trained models. The first, direct buffer convolution, amortizes unpacking overhead by caching unpacked inputs. The second, packed sub-byte multiply-accumulate, utilizes a single multiplication instruction to perform multiple sub-byte multiply-accumulate computations. To further accelerate DNN computation, we propose two instructions, Multiply-Shift-Accumulate and Unpack, to extend the existing microcontroller instruction set. On the tested networks, TF-Net can help improve the computation performance and energy efficiency by 1.83× and 2.28× on average, respectively.

Full Text