Arbitrary Quantization Research Articles

In the field of edge computing, quantizing convolutional neural networks (CNNs) using extremely low bit widths can significantly alleviate the associated storage and computational burdens in embedded hardware, thereby improving computational efficiency. However, such quantization also presents a challenge related to substantial decreases in detection accuracy. This paper proposes an innovative method, called Adaptive Global Power-of-Two Ternary Quantization Based on Unfixed Boundary Thresholds (APTQ). APTQ achieves adaptive quantization by quantizing each filter into two binary subfilters represented as power-of-two values, thereby addressing the accuracy degradation caused by a lack of expression ability of low-bit-width weight values and the contradiction between fixed quantization boundaries and the uneven actual weight distribution. It effectively reduces the accuracy loss while at the same time presenting strong hardware-friendly characteristics because of the power-of-two quantization. This paper extends the APTQ algorithm to propose the APQ quantization algorithm, which can adapt to arbitrary quantization bit widths. Furthermore, this paper designs dedicated edge deployment convolutional computation modules for the obtained quantized models. Through quantization comparison experiments with multiple commonly used CNN models utilized on the CIFAR10, CIFAR100, and Mini-ImageNet data sets, it is verified that the APTQ and APQ algorithms possess better accuracy performance than most state-of-the-art quantization algorithms and can achieve results with very low accuracy loss in certain CNNs (e.g., the accuracy loss of the APTQ ternary ResNet-56 model on CIFAR10 is 0.13%). The dedicated convolutional computation modules enable the corresponding quantized models to occupy fewer on-chip hardware resources in edge chips, thereby effectively improving computational efficiency. This adaptive CNN quantization method, combined with the power-of-two quantization results, strikes a balance between the quantization accuracy performance and deployment efficiency in embedded hardware. As such, valuable insights for the industrial edge computing domain can be gained.

Read full abstract

Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits (vertical layers) organized from the most significant bit (also called the basic layer) to less significant bits (enhance layers). Hence, a neural network with an arbitrary quantization precision can be obtained by adding corresponding enhance layers to the basic layer. However, we empirically find that models obtained with existing quantization methods suffer severe performance degradation if adapted to vertical-layered weight representation. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. After the model is trained, to construct a vertical-layered network, the lowest bit-width quantized weights become the basic layer, and every bit dropped along the downsampling process act as an enhance layer. Our design is extensively evaluated on CIFAR-100 and ImageNet datasets. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.

Read full abstract

Arbitrary Quantization Research Articles

Articles published on Arbitrary Quantization

MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization

Joint-Guided Distillation Binary Neural Network via Dynamic Channel-Wise Diversity Enhancement for Object Detection

Multipurpose Deep-Learning Accelerator for Arbitrary Quantization With Reduction of Storage, Logic, and Latency Waste

Adaptive Global Power-of-Two Ternary Quantization Algorithm Based on Unfixed Boundary Thresholds.

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference.

Performance Analysis and Optimization of Multicell Massive MIMO With Variable-Resolution ADCs Over Correlated Rayleigh Fading Channels

Positivity for quantum cluster algebras from unpunctured orbifolds

An expansion formula for quantum cluster algebras from unpunctured triangulated surfaces

On the generalization of moyal equation for an arbitrary linear quantization

Discrete Phase Shifters-Based Hybrid Precoding for Full-Duplex mmWave Relaying Systems

Generalized Evolution Equation of Wigner Function for an Arbitrary Linear Quantization

Reprogrammable Spatiotemporally Modulated Graphene-Based Functional Metasurfaces

Measuring the polarization of electromagnetic fields using Rabi-rate measurements with spatial resolution: Experiment and theory

Quantized stabilization of networked control systems with actuator saturation

Quantized global parametrization

Graded quiver varieties, quantum cluster algebras and dual canonical basis

Stabilization of fuzzy systems with quantization and packet dropout

Quantized Consensus of Multi‐Agent Systems Via Broadcast Gossip Algorithms

Rotational symmetry of classical orbits, arbitrary quantization of angular momentum and the role of the gauge field in two-dimensional space

Robust watermarking procedure based on JPEG discrete cosine transform image compression

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Arbitrary Quantization Research Articles

Articles published on Arbitrary Quantization

MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization

Joint-Guided Distillation Binary Neural Network via Dynamic Channel-Wise Diversity Enhancement for Object Detection

Multipurpose Deep-Learning Accelerator for Arbitrary Quantization With Reduction of Storage, Logic, and Latency Waste

Adaptive Global Power-of-Two Ternary Quantization Algorithm Based on Unfixed Boundary Thresholds.

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference.

Performance Analysis and Optimization of Multicell Massive MIMO With Variable-Resolution ADCs Over Correlated Rayleigh Fading Channels

Positivity for quantum cluster algebras from unpunctured orbifolds

An expansion formula for quantum cluster algebras from unpunctured triangulated surfaces

On the generalization of moyal equation for an arbitrary linear quantization

Discrete Phase Shifters-Based Hybrid Precoding for Full-Duplex mmWave Relaying Systems

Generalized Evolution Equation of Wigner Function for an Arbitrary Linear Quantization

Reprogrammable Spatiotemporally Modulated Graphene-Based Functional Metasurfaces

Measuring the polarization of electromagnetic fields using Rabi-rate measurements with spatial resolution: Experiment and theory

Quantized stabilization of networked control systems with actuator saturation

Quantized global parametrization

Graded quiver varieties, quantum cluster algebras and dual canonical basis

Stabilization of fuzzy systems with quantization and packet dropout

Quantized Consensus of Multi‐Agent Systems Via Broadcast Gossip Algorithms

Rotational symmetry of classical orbits, arbitrary quantization of angular momentum and the role of the gauge field in two-dimensional space

Robust watermarking procedure based on JPEG discrete cosine transform image compression