Abstract

Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively large memory space and computational costs, which not only increase the time to train the model but also limit the real-time application of the trained model. For this reason, various neural network compression methodologies have been studied to efficiently use CNN in small embedded hardware such as mobile and edge devices. In this paper, we propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. The proposed method performs efficient weights quantization using a significantly smaller number of sampled weights than the number of original weights. Four-bit quantization experiments on the classification of the ImageNet dataset with various CNN architectures show that the proposed methodology can perform weights quantization efficiently in terms of computational costs without significant reduction in model performance.

Highlights

  • In the field of image recognition, various types of deep learning models based on Convolutional neural networks (CNN) have been proposed and achieved excellent results [1,2,3,4,5]

  • This paper proposes a kernel density estimation based non-uniform quantization that can be more efficient in terms of computation time

  • The weights quantizations using the proposed kernel density estimation based k-meanson quantization (KDE-KM) and Kernel Density Estimation Based Lloyd–Max Quantizer (KDE-LM) were performed on AlexNet, VGGNet, ResNet, which are well-known in the field of image classification [1,2,4]

Read more

Summary

Introduction

In the field of image recognition, various types of deep learning models based on CNN have been proposed and achieved excellent results [1,2,3,4,5]. The high computational costs of the deep learning apply the model in the real-time environment, and increase the energy consumption of the device. The recently proposed quantization studies of deep learning model are focused easier [6]. The recently proposed quantization but studies of deep learning are focused on simplifying the hardware implementation further reducing the model computational costs while on simplifying the hardware implementation but further reducing the computational costs maintaining the performance of the original model. This float process, can obtain quantized thatclosest have smaller bandwidth compared to Through that of 32-bit typewe weights This process is referred to as weights quantization, and it enables more efficient application of various deep process is referred to as weights quantization, and it enables more efficient application of various learning models. Recent studies related to weights quantization attempt to restore the performance of deep learning models. Experiments on various CNN architectures such as AlexNet, VGGNet, and ResNet show that the proposed methodology can be widely applied to various CNN architectures

Weights Quantization of Deep Learning Model
Scalar Quantization
Kernel Density Estimation
Overview of the Proposed Quantization Process
Kernel Density
Experimental Setup
Performance Evaluations on Quantized CNN Architecutres
Visualization of Quantized
Visualization of quantized filters
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.