A Ternary Neural Network with Compressed Quantized Weight Matrix for Low Power Embedded Systems

S N Truong

doi:10.48084/etasr.4758

Abstract

In this paper, we propose a method of transforming a real-valued matrix to a ternary matrix with controllable sparsity. The sparsity of quantized weight matrices can be controlled by adjusting the threshold during the training and quantizing process. A 3-layer ternary neural network was trained with the MNIST dataset using the proposed adjustable dynamic threshold. The sparsity of the quantized weight matrices varied from 0.1 to 0.6 and the obtained recognition rate reduced from 91% to 88%. The sparse weight matrices were compressed by the compressed sparse row format to speed up the ternary neural network, which can be deployed on low-power embedded systems, such as the Raspberry Pi 3 board. The ternary neural network with the sparsity of quantized weight matrices of 0.1 is 4.24 times faster than the ternary neural network without compressing weight matrices. The ternary neural network is faster as the sparsity of quantized weight matrices increases. When the sparsity of the quantized weight matrices is as high as 0.6, the recognition rate degrades by 3%, however, the speed is 9.35 times the ternary neural network's without compressing quantized weight matrices. Ternary neural network work with compressed sparse matrices is feasible for low-cost, low-power embedded systems.

Full Text