Abstract

Deep neural network models have achieved great success in various fields, including computer vision and recommendation systems. However, deep models are usually large and computationally time-consuming. Reducing the size of deep models and speeding up the learning process without a sharp drop in accuracy becomes a promising goal, both in research and practice. The channel pruning is one of the most effective methods, which can not only compress deep model size, but also directly speed up inference. In this paper, we propose a novel channel attention block called parallel pooling attention block (PPAB) that is designed on top of the squeeze and excitation (SE) blocks. There are two optimization improvements of PPAB. First, parallel max-pooling branch is added on top of SE blocks. Second, we avoid dimensionality reduction during the excitation phase. Both of these optimizations improve the channel importance measurement capabilities of PPAB. Experiment results show that PPAB outperforms general SE blocks on the channel attention objective. The proposed pruning method could be applied efficiently both in computer vision and recommendation systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call