Abstract
Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively large memory space and computational costs, which not only increase the time to train the model but also limit the real-time application of the trained model. For this reason, various neural network compression methodologies have been studied to efficiently use CNN in small embedded hardware such as mobile and edge devices. In this paper, we propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. The proposed method performs efficient weights quantization using a significantly smaller number of sampled weights than the number of original weights. Four-bit quantization experiments on the classification of the ImageNet dataset with various CNN architectures show that the proposed methodology can perform weights quantization efficiently in terms of computational costs without significant reduction in model performance.
Highlights
In the field of image recognition, various types of deep learning models based on Convolutional neural networks (CNN) have been proposed and achieved excellent results [1,2,3,4,5]
This paper proposes a kernel density estimation based non-uniform quantization that can be more efficient in terms of computation time
The weights quantizations using the proposed kernel density estimation based k-meanson quantization (KDE-KM) and Kernel Density Estimation Based Lloyd–Max Quantizer (KDE-LM) were performed on AlexNet, VGGNet, ResNet, which are well-known in the field of image classification [1,2,4]
Summary
In the field of image recognition, various types of deep learning models based on CNN have been proposed and achieved excellent results [1,2,3,4,5]. The high computational costs of the deep learning apply the model in the real-time environment, and increase the energy consumption of the device. The recently proposed quantization studies of deep learning model are focused easier [6]. The recently proposed quantization but studies of deep learning are focused on simplifying the hardware implementation further reducing the model computational costs while on simplifying the hardware implementation but further reducing the computational costs maintaining the performance of the original model. This float process, can obtain quantized thatclosest have smaller bandwidth compared to Through that of 32-bit typewe weights This process is referred to as weights quantization, and it enables more efficient application of various deep process is referred to as weights quantization, and it enables more efficient application of various learning models. Recent studies related to weights quantization attempt to restore the performance of deep learning models. Experiments on various CNN architectures such as AlexNet, VGGNet, and ResNet show that the proposed methodology can be widely applied to various CNN architectures
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.