In-Hardware Training Chip Based on CMOS Invertible Logic for Machine Learning

Naoya Onizawa,Brett H Meyer,Warren J Gross,Sean C Smithson,Takahiro Hanyu

doi:10.1109/tcsi.2019.2960383

Abstract

Deep Neural Networks (DNNs) have recently shown state-of-the-art results on various applications, such as computer vision and recognition tasks. DNN inference engines can be implemented in hardware with high energy efficiency as the computation can be realized using a low-precision fixed point or even binary precision with sufficient cognition accuracies. On the other hand, training DNNs using the well-known back-propagation algorithm requires high-precision floating-point computations on a CPU and/or GPU causing significant power dissipation (more than hundreds of kW) and long training time (several days or more). In this paper, we demonstrate a training chip fabricated using a commercial 65-nm CMOS technology for machine learning. The chip performs training without back propagation by using invertible logic with stochastic computing that can directly obtain weight values using input/output training data with low precision suitable for inference. When training neurons that compute the weighted sum of all inputs and then apply a non-linear activation function, our chip demonstrates a reduction of power dissipation and latency by 99.98% and 99.95%, respectively, in comparison with a state-of-the-art software implementation.

Full Text