Full-precision Counterparts Research Articles

Decentralized distributed learning is the key to enabling large-scale machine learning (training) on the edge devices utilizing private user-generated local data, without relying on the cloud. However, practical realization of such on-device training is limited by the communication and compute bottleneck. In this paper, we propose and show the convergence of low precision decentralized training that aims to reduce the computational complexity and communication cost of decentralized training. Many feedback-based compression techniques have been proposed in the literature to reduce communication costs. To the best of our knowledge, there is no work that applies and shows compute efficient training techniques such as quantization, pruning etc., for peer-to-peer decentralized learning setups. Since real-world applications have a significant skew in the data distribution, we design ”Range-EvoNorm” as the normalization activation layer which is better suited for low precision training over non-IID data. Moreover, we show that the proposed low precision training can be used in synergy with other communication compression methods decreasing the communication cost further. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with non-IID data. However, when low precision training is accompanied by communication compression through sparsification we observe a 1−2% drop in accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by ∼4× and compute energy by a factor of ∼20×, while trading off less than a 1% accuracy for both IID and non-IID data. In particular, for higher skew values, we observe an increase in accuracy (by ∼0.5%) with low precision training, indicating the regularization effect of the quantization.

Read full abstract

In this paper, we propose to train binarized convolutional neural networks (CNNs) that are of significant importance for deploying deep learning to mobile devices with limited power capacity and computing resources. Previous works on quantizing CNNs often seek to approximate the floating-point information of weights and/or activations using a set of discrete values. Such methods, termed value approximation here, typically are built on the same network architecture of the full-precision counterpart. Instead, we take a new “structured approximation” view for network quantization — it is possible and valuable to exploit flexible architecture transformation when learning low-bit networks, which can achieve even better performance than the original networks in some cases. In particular, we propose a “group decomposition” strategy, termed GroupNet, which divides a network into desired groups. Interestingly, with our GroupNet strategy, each full-precision group can be effectively reconstructed by aggregating a set of homogeneous binary branches. We also propose to learn effective connections among groups to improve the representation capability. To improve the model capacity, we propose to dynamically execute sparse binary branches conditioned on input features while preserving the computational cost. More importantly, the proposed GroupNet shows strong flexibility for a few vision tasks. For instance, we extend the GroupNet for accurate semantic segmentation by embedding the rich context into the binary structure. The proposed GroupNet also shows strong performance on object detection. Experiments on image classification, semantic segmentation, and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature. Moreover, the speedup and runtime memory cost evaluation comparing with related quantization strategies is analyzed on GPU platforms, which serves as a strong benchmark for further research.

Read full abstract

Full-precision Counterparts Research Articles

Articles published on Full-precision Counterparts

AMED: Automatic Mixed-Precision Quantization for Edge Devices

RAD-BNN: Regulating activation distribution for accurate binary neural network

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries

Toward Pixel-Level Precision for Binary Super-Resolution With Mixed Binary Representation.

Binarizing by Classification: Is Soft Function Really Necessary?

Quantization via Distillation and Contrastive Learning.

Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers

Extremely Sparse Networks via Binary Augmented Pruning for Fast Image Classification.

Quantformer: Learning Extremely Low-Precision Vision Transformers.

Bi-CapsNet: A Binary Capsule Network for EEG-based Emotion Recognition.

E2FIF: Push the limit of Binarized Deep Imagery Super-resolution using End-to-end Full-precision Information Flow.

Cellular Binary Neural Network for Accurate Image Classification and Semantic Segmentation

Residual Quantization for Low Bit-Width Neural Networks

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update

Bimodal-Distributed Binarized Neural Networks

Low precision decentralized distributed training over IID and non-IID data

Elastic-Link for Binarized Neural Networks

PB-GCN: Progressive binary graph convolutional networks for skeleton-based action recognition

Structured Binary Neural Networks for Image Recognition

Low-Bitwidth Convolutional Neural Networks for Wireless Interference Identification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Full-precision Counterparts Research Articles

Articles published on Full-precision Counterparts

AMED: Automatic Mixed-Precision Quantization for Edge Devices

RAD-BNN: Regulating activation distribution for accurate binary neural network

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries

Toward Pixel-Level Precision for Binary Super-Resolution With Mixed Binary Representation.

Binarizing by Classification: Is Soft Function Really Necessary?

Quantization via Distillation and Contrastive Learning.

Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers

Extremely Sparse Networks via Binary Augmented Pruning for Fast Image Classification.

Quantformer: Learning Extremely Low-Precision Vision Transformers.

Bi-CapsNet: A Binary Capsule Network for EEG-based Emotion Recognition.

E2FIF: Push the limit of Binarized Deep Imagery Super-resolution using End-to-end Full-precision Information Flow.

Cellular Binary Neural Network for Accurate Image Classification and Semantic Segmentation

Residual Quantization for Low Bit-Width Neural Networks

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update

Bimodal-Distributed Binarized Neural Networks

Low precision decentralized distributed training over IID and non-IID data

Elastic-Link for Binarized Neural Networks

PB-GCN: Progressive binary graph convolutional networks for skeleton-based action recognition

Structured Binary Neural Networks for Image Recognition

Low-Bitwidth Convolutional Neural Networks for Wireless Interference Identification