AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Cheng Gong,Qian Deng,Tao Li,Cheng-Kun Du,Ye Lu,Su-Rong Dai

doi:10.1007/s11390-022-1632-9

Abstract

Exploring the expected quantizing scheme with suitable mixed-precision policy is the key point to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for domain experts, and an automatic compression method is needed. However, the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios. In this paper, we propose an end-to-end framework named AutoQNN, for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor. AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques: quantizing scheme search (QSS), quantizing precision learning (QPL), and quantized architecture generation (QAG). QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search, and then uses the differentiable neural architecture search (DNAS) algorithm to seek the layer- or model-desired scheme from the set. QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes, to the best of our knowledge. QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint. QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention, to facilitate end-to-end neural network quantization. We have implemented AutoQNN and integrated it into Keras. Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology

Lead the way for us

Similar Papers

Quantum Neural Network for Image Classification Using TensorFlow Quantum
J Arun Pandian ... K Kanchanadevi
-
J Arun Pandian, et. al.J Arun Pandian ... K Kanchanadevi
01 Jan 2023
01 Jan 2023

Filter-Wise Quantization of Deep Neural Networks for IoT Devices
Hoseung Kim ... Dongkun Shin
-
Hoseung Kim, et. al.Hoseung Kim ... Dongkun Shin
10 Jan 2021
10 Jan 2021

ALigN: A Highly Accurate Adaptive Layerwise Log_2_Lead Quantization of Pre-Trained Neural Networks
Siddharth Gupta ... Kapil Ahuja
IEEE Access | VOL. 8
Siddharth Gupta, et. al.Siddharth Gupta ... Kapil Ahuja
01 Jan 2020
IEEE Access | VOL. 8

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks
Shoukang Hu ... Xunying Liu
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Shoukang Hu, et. al.Shoukang Hu ... Xunying Liu
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology