LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model. To address this gap, we propose to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization. Our method for learning the quantizers applies to both network weights and activations with arbitrary-bit precision, and our quantizers are easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets show that our method works consistently well for various network structures such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous quantization methods in terms of accuracy by an appreciable margin. Code available at https://github.com/Microsoft/LQ-Nets
- Conference Article
11
- 10.1109/wacv45572.2020.9093377
- Mar 1, 2020
State-of-the-art Deep Neural Networks (DNNs) are typically too cumbersome to be practically useful in portable electronic devices. As such, several works pursue model compression that seeks to drastically reduce computational memory footprints, FLOPS and memory for storage. Many of these works achieve unstructured compression, where the compressed models are not directly useful since dedicated hardware and specialized algorithms are required for storage of sparse weights and fast sparse matrix-vector multi-plication respectively. In this paper, we propose structured compression of large DNNs using debiased elastic group LASSO (DEGL), which is motivated by different interesting characteristics of the individual components. That is, where group LASSO penalty enforces structured sparsity, l2-norm penalty promotes features grouping, and debiasing disentangles sparsity and shrinkage effects of group LASSO. We perform extensive experiments by applying DEGL to different DNN architectures including LeNet, VGG, AlexNet and ResNet on MNIST, CIFAR-10, CIFAR-100 and ImageNet datasets. Furthermore, we validate the effectiveness of our proposal on domain adaptation using Oxford-102 flower species and Food-5K datasets. Results show that DEGL can compress DNNs by several folds with small or no loss of performance. Particularly, DEGL outperforms conventional group LASSO and several other state-of-the-art methods that perform structured compression.
- Research Article
1
- 10.1049/el.2019.2376
- Sep 1, 2019
- Electronics Letters
GenSyth: a new way to understand deep learning
- Research Article
196
- 10.1109/tc.2019.2914438
- Oct 1, 2019
- IEEE Transactions on Computers
Deep neural networks (DNNs) have begun to have a pervasive impact on various applications of machine learning. However, the problem of finding an optimal DNN architecture for large applications is challenging. Common approaches go for deeper and larger DNN architectures but may incur substantial redundancy. To address these problems, we introduce a network growth algorithm that complements network pruning to learn both weights and compact DNN architectures during training. We propose a DNN synthesis tool (NeST) that combines both methods to automate the generation of compact and accurate DNNs. NeST starts with a randomly initialized sparse network called the seed architecture. It iteratively tunes the architecture with gradient-based growth and magnitude-based pruning of neurons and connections. Our experimental results show that NeST yields accurate, yet very compact DNNs, with a wide range of seed architecture selection. For the LeNet-300-100 (LeNet-5) architecture, we reduce network parameters by 70.2x (74.3x) and floating-point operations (FLOPs) by 79.4x (43.7x). For the AlexNet and VGG-16 architectures, we reduce network parameters (FLOPs) by 15.7x (4.6x) and 30.2x (8.6x), respectively. NeST's grow-and-prune paradigm delivers significant additional parameter and FLOPs reduction relative to pruning-only methods.
- Research Article
24
- 10.1016/j.neunet.2021.08.028
- Sep 8, 2021
- Neural Networks
Nonlinear tensor train format for deep neural network compression
- Conference Article
12
- 10.1109/vlsid.2019.00056
- Jan 1, 2019
The recent trend in deep neural networks (DNNs) research is to make the networks more compact. The motivation behind designing compact DNNs is to improve energy efficiency since by virtue of having lower memory footprint, compact DNNs have lower number of off-chip accesses which improves energy efficiency. However, we show that making DNNs compact has indirect and subtle implications which are not well-understood. Reducing the number of parameters in DNNs increases the number of activations which, in turn, increases the memory footprint. We evaluate several recently-proposed compact DNNs on Tesla P100 GPU and show that their "activations to parameters ratio" ranges between 1.4 to 32.8. Further, the "memory-footprint to model size ratio" ranges between 15 to 443. This shows that a higher number of activations causes large memory footprint which increases on-chip/off-chip data movements. Furthermore, these parameter-reducing techniques reduce the arithmetic intensity which increases on-chip/off-chip memory bandwidth requirement. Due to these factors, the energy efficiency of compact DNNs may be significantly reduced which is against the original motivation for designing compact DNNs.
- Book Chapter
- 10.1007/978-3-319-68759-9_70
- Jan 1, 2017
Despite their great success, deep neural networks (DNN) are hard to deploy on devices with limited hardware like mobile phones because of massive parameters. Many methods have been proposed for DNN compression, i.e., to reduce the parameters of DNN models. However, almost all of them are based on reference models, which were firstly trained. In this paper, we propose an approach to perform DNN training and compression simultaneously. More concretely, a dynamic and adaptive threshold (DAT) framework is utilized to prune a DNN gradually by changing the pruning threshold during training. Experiments show that DAT can not only reach comparable or better compression rate almost without loss of accuracy than state-of-the-art DNN compression methods, but also beat DNN sparse training methods by a large margin.
- Conference Article
4
- 10.1145/2964284.2967273
- Oct 1, 2016
Deep neural networks generally involve some layers with millions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.
- Research Article
15
- 10.1016/j.neunet.2022.02.024
- Mar 8, 2022
- Neural Networks
Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms
- Conference Article
- 10.1109/icdmw.2018.00178
- Nov 1, 2018
Deep neural networks have demonstrated its superiority in many fields. Its excellent performance relys on quite a lot of parameters used in the network, resulting in a series of problems, including memory and computation requirement and overfitting, which seriously impede the application of deep neural networks in many assignments in practice. A considerable number of model compression methods have been proposed in deep neural networks to reduce the number of parameters used in networks, among which there is one kind of methods persuing sparsity in deep neural networks. In this paper, we propose to combine l1,1 and l1,2 norm together as the regularization term to regularize the objective function of the network. We introduce group and l1,1 can zero out weights in both intergroup and intra-group level. l1,2 regularizer can obtain intragroup level sparsity and cause even weights among groups. We adopt proximal gradient descent to solve the objective function regularized by our combined regularization. Experimental results demonstrate the effectiveness of the proposed regularizer when comparing it with other baseline regularizers.
- Research Article
4
- 10.1109/jstsp.2020.2992384
- May 1, 2020
- IEEE Journal of Selected Topics in Signal Processing
This paper introduces an adaptive sampling methodology for automated compression of Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization. Our objective is to locate an optimal hyperparameter configuration that leads to lowest model complexity while adhering to a desired inference accuracy. We design a score function that evaluates the aforementioned optimality. The optimization problem is then formulated as searching for the maximizers of this score function. To this end, we devise a non-uniform adaptive sampler that aims at reconstructing the band-limited score function. We reduce the total number of required objective function evaluations by realizing a targeted sampler. We propose three adaptive sampling methodologies, i.e., AdaNS-Zoom, AdaNS-Genetic, and AdaNS-Gaussian, where new batches of samples are chosen based on the history of previous evaluations. Our algorithms start sampling from a uniform distribution over the entire search-space and iteratively adapt the sampling distribution to achieve highest density around the function maxima. This, in turn, allows for a low-error reconstruction of the objective function around its maximizers. Our extensive evaluations corroborate AdaNS effectiveness by outperforming existing rule-based and Reinforcement Learning methods in terms of DNN compression rate and/or inference accuracy.
- Research Article
3
- 10.1109/tnnls.2022.3217403
- Jul 1, 2024
- IEEE transactions on neural networks and learning systems
Training deep neural networks (DNNs) typically requires massive computational power. Existing DNNs exhibit low time and storage efficiency due to the high degree of redundancy. In contrast to most existing DNNs, biological and social networks with vast numbers of connections are highly efficient and exhibit scale-free properties indicative of the power law distribution, which can be originated by preferential attachment in growing networks. In this work, we ask whether the topology of the best performing DNNs shows the power law similar to biological and social networks and how to use the power law topology to construct well-performing and compact DNNs. We first find that the connectivities of sparse DNNs can be modeled by truncated power law distribution, which is one of the variations of the power law. The comparison of different DNNs reveals that the best performing networks correlated highly with the power law distribution. We further model the preferential attachment in DNNs evolution and find that continual learning in networks with growth in tasks correlates with the process of preferential attachment. These identified power law dynamics in DNNs can lead to the construction of highly accurate and compact DNNs based on preferential attachment. Inspired by the discovered findings, two novel applications have been proposed, including evolving optimal DNNs in sparse network generation and continual learning tasks with efficient network growth using power law dynamics. Experimental results indicate that the proposed applications can speed up training, save storage, and learn with fewer samples than other well-established baselines. Our demonstration of preferential attachment and power law in well-performing DNNs offers insight into designing and constructing more efficient deep learning.
- Dissertation
- 10.17760/d20383685
- May 10, 2021
High-performance and energy-efficient deep learning for resource-constrained devices
- Book Chapter
- 10.1007/978-3-319-68560-1_22
- Jan 1, 2017
In the last years, deep neural networks have revolutionized machine learning tasks. However, the design of deep neural network architectures is still based on try-and-error procedures, and they are usually complex models with high computational cost. This is the reason behind the efforts that are made in the deep learning community to create small and compact models with comparable accuracy to the current deep neural networks. In literature, different methods to reach this goal are presented; among them, techniques based on low rank factorization are used in order to compress pre trained models with the aim to provide a more compact version of them without losing their effectiveness. Despite their promising results, these techniques produce auxiliary structures between network layers; this work shows that is possible to overcome the need for such elements by using simple regularization techniques. We tested our approach on the VGG16 model obtaining a four times faster reduction without loss in accuracy and avoiding supplementary structures between the network layers.
- Research Article
42
- 10.1109/jiot.2021.3063497
- Mar 5, 2021
- IEEE Internet of Things Journal
Deep neural networks (DNNs) have shown great success in completing complex tasks. However, DNNs inevitably bring high computational cost and storage consumption due to the complexity of hierarchical structures, thereby hindering their wide deployment in Internet-of-Things (IoT) devices, which have limited computational capability and storage capacity. Therefore, it is a necessity to investigate the technologies to compact DNNs. Despite tremendous advances in compacting DNNs, few surveys summarize compacting-DNNs technologies, especially for IoT applications. Hence, this article presents a comprehensive study on compacting-DNNs technologies. We categorize compacting-DNNs technologies into three major types: 1) network model compression; 2) knowledge distillation (KD); and 3) modification of network structures. We also elaborate on the diversity of these approaches and make side-by-side comparisons. Moreover, we discuss the applications of compacted DNNs in various IoT applications and outline future directions.
- Conference Article
- 10.1109/itme56794.2022.00034
- Nov 1, 2022
At present, deep neural networks (DNNs) have been widely used, and the deployment of DNNs to resource-constrained devices becomes a popular trend, which leads to the problem of compression of deep neural networks. In this paper, we propose a channel-level deep neural network compression method, which aims to remove unimportant channels in the network, reduce the number of neural network parameters, and improve the performance of the compressed neural network. Specifically, to reduce channel redundancy more effectively, our approach introduces K-order statistics in the Batch Normalization (BN) layer, identifies and removes channels with low statistical values to generate a compact network, and improves the accuracy of the compressed network by fine-tuning. Our approach does not change the DNN architecture and does not require special hardware and software accelerators for the generated compression network. Our method was tested on CIFAR-10 image classification public dataset with various DNN models. By comparing with other model compression methods, the effectiveness of our method has been demonstrated.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.