Abstract

This paper considers the design of a binary scalar quantizer of Laplacian source and its application in compressed neural networks. The quantizer performance is investigated in a wide dynamic range of data variances, and for that purpose, we derive novel closed-form expressions. Moreover, we propose two selection criteria for the variance range of interest. Binary quantizers are further implemented for compressing neural network weights and its performance is analysed for a simple classification task. Good matching between theory and experiment is observed and a great possibility for implementation is indicated.

Highlights

  • Artificial neural networks (NNs) have become an attractive research field in recent decades for resolving different challenges due to the increasing availability of powerful hardware [1]

  • 1Abstract—This paper considers the design of a binary scalar quantizer of Laplacian source and its application in compressed neural networks

  • The goal of the section is to verify the theoretical analysis provided in previous Section III by applying a binary quantizer in processing the weights of NN

Read more

Summary

Introduction

Artificial neural networks (NNs) have become an attractive research field in recent decades for resolving different challenges due to the increasing availability of powerful hardware [1]. It is worth mentioning that the most significant achievements have been provided in tasks, such as image classification [2], object recognition [3], and speech processing [4]. The application in other fields has been performed, where some promising results have been achieved [5]–[7]. The improved performance (i.e., high prediction accuracy level) has often been provided using very complex NN architectures, with a large amount of parameters, computational and storage resources. This in turn can be a limiting factor for the application of NNs in portable and edge computing devices with limited memory and processing power, or in latency-critical services. NN parameters (weights, activations, etc.), usually represented in 32-bits floating point format (full precision), are mapped to fixed-point representations using lower bit lengths

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.