Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Cody Blakeney,Yan Yan,Xiaomin Li,Ziliang Zong

doi:10.1109/tpds.2020.3047003

Cody Blakeney, Yan Yan + Show 2 more

Open Access

https://doi.org/10.1109/tpds.2020.3047003

Copy DOI

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Feb 2, 2021
Citations: 48	License type: publisher-specific, author manuscript

Affiliation: Texas State University

Abstract

Deep neural networks (DNNs) have been extremely successful in solving many challenging AI tasks in natural language processing, speech recognition, and computer vision nowadays. However, DNNs are typically computation intensive, memory demanding, and power hungry, which significantly limits their usage on platforms with constrained resources. Therefore, a variety of compression techniques (e.g., quantization, pruning, and knowledge distillation) have been proposed to reduce the size and power consumption of DNNs. Blockwise knowledge distillation is one of the compression techniques that can effectively reduce the size of a highly complex DNN. However, it is not widely adopted due to its long training time. In this article, we propose a novel parallel blockwise distillation algorithm to accelerate the distillation process of sophisticated DNNs. Our algorithm leverages local information to conduct independent blockwise distillation, utilizes depthwise separable layers as the efficient replacement block architecture, and properly addresses limiting factors (e.g., dependency, synchronization, and load balancing) that affect parallelism. The experimental results running on an AMD server with four Geforce RTX 2080Ti GPUs show that our algorithm can achieve 3x speedup plus 19 percent energy savings on VGG distillation, and 3.5x speedup plus 29 percent energy savings on ResNet distillation, both with negligible accuracy loss. The speedup of ResNet distillation can be further improved to 3.87 when using four RTX6000 GPUs in a distributed cluster.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Similar Papers

RAT: RNN-Attention Transformer for Speech Enhancement
Tailong Zhang ... Hao Li
-
Tailong Zhang, et. al.Tailong Zhang ... Hao Li
11 Dec 2022
11 Dec 2022

A Deep Relevance Matching Model for Ad-hoc Retrieval
Jiafeng Guo ... W Bruce Croft
-
Jiafeng Guo, et. al.Jiafeng Guo ... W Bruce Croft
24 Oct 2016
24 Oct 2016

Neu-IR
Nick Craswell ... Jiafeng Guo
-
Nick Craswell, et. al.Nick Craswell ... Jiafeng Guo
07 Jul 2016
07 Jul 2016

Differentially private optimization algorithms for deep neural networks
Roan Gylberth ... Setiadi Yazid
-
Roan Gylberth, et. al.Roan Gylberth ... Setiadi Yazid
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems