Deep Model Pruning By Parallel Pooling Attention Block

Junnan Wu,Liming Zhang

doi:10.1109/wi-iat55865.2022.00138

Abstract

Deep neural network models have achieved great success in various fields, including computer vision and recommendation systems. However, deep models are usually large and computationally time-consuming. Reducing the size of deep models and speeding up the learning process without a sharp drop in accuracy becomes a promising goal, both in research and practice. The channel pruning is one of the most effective methods, which can not only compress deep model size, but also directly speed up inference. In this paper, we propose a novel channel attention block called parallel pooling attention block (PPAB) that is designed on top of the squeeze and excitation (SE) blocks. There are two optimization improvements of PPAB. First, parallel max-pooling branch is added on top of SE blocks. Second, we avoid dimensionality reduction during the excitation phase. Both of these optimizations improve the channel importance measurement capabilities of PPAB. Experiment results show that PPAB outperforms general SE blocks on the channel attention objective. The proposed pruning method could be applied efficiently both in computer vision and recommendation systems.

Full Text