Balanced Column-Wise Block Pruning for Maximizing GPU Parallelism

Cheonjun Park,Mincheol Park,Suhyun Kim,Minkyu Kim,Hyun Jae Oh,Won Woo Ro,Myung Kuk Yoon

doi:10.1609/aaai.v37i8.26126

Abstract

Pruning has been an effective solution to reduce the number of computations and the memory requirement in deep learning. The pruning unit plays an important role in exploiting the GPU resources efficiently. The filter is proposed as a simple pruning unit of structured pruning. However, since the filter is quite large as pruning unit, the accuracy drop is considerable with a high pruning ratio. GPU rearranges the weight and input tensors into tiles (blocks) for efficient computation. To fully utilize GPU resources, this tile structure should be considered, which is the goal of block pruning. However, previous block pruning prunes both row vectors and column vectors. Pruning of row vectors in a tile corresponds to filter pruning, and it also interferes with column-wise block pruning of the following layer. In contrast, column vectors are much smaller than row vectors and can achieve lower accuracy drop. Additionally, if the pruning ratio for each tile is different, GPU utilization can be limited by imbalanced workloads by irregular-sized blocks. The same pruning ratio for the weight tiles processed in parallel enables the actual inference process to fully utilize the resources without idle time. This paper proposes balanced column-wise block pruning, named BCBP, to satisfy two conditions: the column-wise minimal size of the pruning unit and balanced workloads. We demonstrate that BCBP is superior to previous pruning methods through comprehensive experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Balanced Column-Wise Block Pruning for Maximizing GPU Parallelism

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 26, 2023
Citations: 1

Similar Papers

High-performance and energy-efficient deep learning for resource-constrained devices
Ao Ren
-
Ao RenAo Ren
10 May 2021
10 May 2021

DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
Ren Ao ... Lin Sheng
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Ren Ao, et. al.Ren Ao ... Lin Sheng
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Multi-granularity pruning for deep residual networks
Yangke Huang ... Zhiming Wang
Journal of Intelligent & Fuzzy Systems | VOL. 39
Yangke Huang, et. al.Yangke Huang ... Zhiming Wang
01 Jan 2020
Journal of Intelligent & Fuzzy Systems | VOL. 39

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems
Gingfung Yeung ... Renyu Yang
IEEE Transactions on Parallel and Distributed Systems | VOL. 33
Gingfung Yeung, et. al.Gingfung Yeung ... Renyu Yang
01 Jan 2021
IEEE Transactions on Parallel and Distributed Systems | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Balanced Column-Wise Block Pruning for Maximizing GPU Parallelism

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence