Non-uniform Quantization Research Articles

CNN model computation on edge devices is tightly restricted to the limited resource and power budgets, which motivates the low-bit quantization technology to compress CNN models into 4-bit or lower format to reduce the model size and increase hardware efficiency. Most current low-bit quantization methods use uniform quantization that maps weight and activation values onto evenly-distributed levels, which usually results in accuracy loss due to distribution mismatch. Meanwhile, some non-uniform quantization methods propose specialized representation that can better match various distribution shapes but are usually difficult to be efficiently accelerated on hardware. In order to achieve low-bit quantization with high accuracy and hardware efficiency, this paper proposes Universal Power-of-Two (UPoT), a novel low-bit quantization method that represents values as the addition of multiple power-of-two values selected from a series of subsets. By updating the subset contents, UPoT can provide adaptive quantization levels for various distributions. For each CNN model layer, UPoT automatically searches for the optimized distribution that minimizes the quantization error. Moreover, we design an efficient accelerator system with specifically optimized power-of-two multipliers and requantization units. Evaluations show that the proposed architecture can provide high-performance CNN inference with reduced circuit area and energy, and outperforms several mainstream CNN accelerators with higher ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8\times $ </tex-math></inline-formula> – <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$65\times $ </tex-math></inline-formula> ) area efficiency and ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> – <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$19\times $ </tex-math></inline-formula> ) energy efficiency. Further experiments of 4/3/2-bit quantization on ResNet18/50, MobileNet_V2 and EfficientNet models show that our UPoT can achieve high model accuracy which greatly outperform other state-of-the-art low-bit quantization methods by 0.3%–6%. The results indicate that our approach provides a highly-efficient accelerator for low-bit CNN model quantization with low hardware overheads and good model accuracy.

Read full abstract

With increased network downsizing and cost minimization in deployment of neural network (NN) models, the utilization of edge computing takes a significant place in modern artificial intelligence today. To bridge the memory constraints of less-capable edge systems, a plethora of quantizer models and quantization techniques are proposed for NN compression with the goal of enabling the fitting of the quantized NN (QNN) on the edge device and guaranteeing a high extent of accuracy preservation. NN compression by means of post-training quantization has attracted a lot of research attention, where the efficiency of uniform quantizers (UQs) has been promoted and heavily exploited. In this paper, we propose two novel non-uniform quantizers (NUQs) that prudently utilize one of the two properties of the simplest UQ. Although having the same quantization rule for specifying the support region, both NUQs have a different starting setting in terms of cell width, compared to a standard UQ. The first quantizer, named the simplest power-of-two quantizer (SPTQ), defines the width of cells that are multiplied by the power of two. As it is the case in the simplest UQ design, the representation levels of SPTQ are midpoints of the quantization cells. The second quantizer, named the modified SPTQ (MSPTQ), is a more competitive quantizer model, representing an enhanced version of SPTQ in which the quantizer decision thresholds are centered between the nearest representation levels, similar to the UQ design. These properties make the novel NUQs relatively simple. Unlike UQ, the quantization cells of MSPTQ are not of equal widths and the representation levels are not midpoints of the quantization cells. In this paper, we describe the design procedure of SPTQ and MSPTQ and we perform their optimization for the assumed Laplacian source. Afterwards, we perform post-training quantization by implementing SPTQ and MSPTQ, study the viability of QNN accuracy and show the implementation benefits over the case where UQ of an equal number of quantization cells is utilized in QNN for the same classification task. We believe that both NUQs are particularly substantial for memory-constrained environments, where simple and acceptably accurate solutions are of crucial importance.

Read full abstract

Non-uniform Quantization Research Articles

Related Topics

Articles published on Non-uniform Quantization

An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

Design and Analysis of Hardware-Limited Non-Uniform Task-Based Quantizers

Point Cloud Soft Multicast for Untethered XR Users

Uplink Signal Detection via Look-Up Table-Based AMP for Massive MIMO Systems

Digital Event-Based Stabilization of Nonlinear Time-Delay Systems

A Non-Idealities Aware Software–Hardware Co-Design Framework for Edge-AI Deep Neural Network Implemented on Memristive Crossbar

Cooperative Constrained Control of Autonomous Vehicles With Nonuniform Input Quantization

Finite-Bit Quantization for Distributed Algorithms With Linear Convergence

Photonics-Assisted Millimeter-Wave Communication System Based on Low-Bit Gaussian Mixture Model Adaptive Vector Quantization

Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization

Data reduction through optimized scalar quantization for more compact neural networks

Analysis of Compressing PAPR-Reduced OFDM IQ Samples for Cloud Radio Access Network

Application of Intelligent Analysis Technology of Football Video Based on Online Target Tracking Algorithm of Motion Characteristics in Football Training.

Adaptive gradients and weight projection based on quantized neural networks for efficient image classification

Extracting More Quantum Randomness With Non-Uniform Quantization

Feature compensation network based on non-uniform quantization of channels for digital image global manipulation forensics

Granger causality from quantized measurements

A GMM-based non-uniform quantization scheme for improving low-resolution IMDD-UFMC system performance

NQRELoc: AP Selection via Nonuniform Quantization RSSI Entropy for Indoor Localization

A Personalized Compression Method for Steady-State Visual Evoked Potential EEG Signals

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Non-uniform Quantization Research Articles

Related Topics

Articles published on Non-uniform Quantization

An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization

Design and Analysis of Hardware-Limited Non-Uniform Task-Based Quantizers

Point Cloud Soft Multicast for Untethered XR Users

Uplink Signal Detection via Look-Up Table-Based AMP for Massive MIMO Systems

Digital Event-Based Stabilization of Nonlinear Time-Delay Systems

A Non-Idealities Aware Software–Hardware Co-Design Framework for Edge-AI Deep Neural Network Implemented on Memristive Crossbar

Cooperative Constrained Control of Autonomous Vehicles With Nonuniform Input Quantization

Finite-Bit Quantization for Distributed Algorithms With Linear Convergence

Photonics-Assisted Millimeter-Wave Communication System Based on Low-Bit Gaussian Mixture Model Adaptive Vector Quantization

Two Novel Non-Uniform Quantizers with Application in Post-Training Quantization

Data reduction through optimized scalar quantization for more compact neural networks

Analysis of Compressing PAPR-Reduced OFDM IQ Samples for Cloud Radio Access Network

Application of Intelligent Analysis Technology of Football Video Based on Online Target Tracking Algorithm of Motion Characteristics in Football Training.

Adaptive gradients and weight projection based on quantized neural networks for efficient image classification

Extracting More Quantum Randomness With Non-Uniform Quantization

Feature compensation network based on non-uniform quantization of channels for digital image global manipulation forensics

Granger causality from quantized measurements

A GMM-based non-uniform quantization scheme for improving low-resolution IMDD-UFMC system performance

NQRELoc: AP Selection via Nonuniform Quantization RSSI Entropy for Indoor Localization

A Personalized Compression Method for Steady-State Visual Evoked Potential EEG Signals