Abstract

Driven by the need for the compression of weights in neural networks (NNs), which is especially beneficial for edge devices with a constrained resource, and by the need to utilize the simplest possible quantization model, in this paper, we study the performance of three-bit post-training uniform quantization. The goal is to put various choices of the key parameter of the quantizer in question (support region threshold) in one place and provide a detailed overview of this choice’s impact on the performance of post-training quantization for the MNIST dataset. Specifically, we analyze whether it is possible to preserve the accuracy of the two NN models (MLP and CNN) to a great extent with the very simple three-bit uniform quantizer, regardless of the choice of the key parameter. Moreover, our goal is to answer the question of whether it is of the utmost importance in post-training three-bit uniform quantization, as it is in quantization, to determine the optimal support region threshold value of the quantizer to achieve some predefined accuracy of the quantized neural network (QNN). The results show that the choice of the support region threshold value of the three-bit uniform quantizer does not have such a strong impact on the accuracy of the QNNs, which is not the case with two-bit uniform post-training quantization, when applied in MLP for the same classification task. Accordingly, one can anticipate that due to this special property, the post-training quantization model in question can be greatly exploited.

Highlights

  • Neural networks (NNs) have achieved remarkable success in a wide range of realworld applications

  • We have shown that when three-bit uniform quantizer (UQ) is utilized for post-training quantization, the accuracies of two NNs (MLP and convolutional neural network (CNN)) that we have pretrained for the MNIST dataset can be preserved for various choices of the key parameter of the quantizer in question

  • We have shown that in post-training three-bit uniform quantization, for both NN models (MLP and CNN) and for two datasets (MNIST and Fashion-MNIST), it is not of utmost importance, as it is in classical quantization, to determine the optimal support region threshold value of the UQ to achieve some predefined accuracy of the quantized neural network (QNN)

Read more

Summary

Introduction

Neural networks (NNs) have achieved remarkable success in a wide range of realworld applications. Their application might be limited or impeded in edge devices with a constrained resource, such as IoT and mobile devices [1,2,3,4,5,6]. On such resource-constrained devices, decreased storage and/or computational costs for NNs are indispensable, the accuracy of NN can be severely degraded if the pathway toward this decrease is not chosen prudently [2,4,6]. Edge computing extends the cloud, making it as close as possible to heterogeneous end devices or end users [1,3]

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.