Toward Compact Transformers for End-to-End Object Detection With Decomposed Chain Tensor Structure

Peining Zhen,Hai-Bao Chen,Wei Wang,Tianshu Hou,Xiaotao Yan,Hao Wei

doi:10.1109/tcsvt.2022.3208062

Abstract

DEtection TRansformer (DETR) is a recently proposed method that streamlines the detection pipeline and achieves competitive results against two-stage detectors such as Faster-RCNN. The DETR models get rid of complex anchor generation and post-processing procedures thereby making the detection pipeline more intuitive. However, the numerous redundant parameters in transformers make the computation and storage of the DETR models intensive, which seriously hinder them to be deployed on the resources-constrained devices. In this paper, to obtain a compact end-to-end detection framework, we propose to deeply compress the transformers with low-rank tensor decomposition. The basic idea of our tensor-based compression method is to represent the large-scale weight matrix in one network layer with a chain of low-order matrices. Furthermore, we show that redundant attention heads will hinder the performance of detection transformers. We thus propose a gated multi-head attention (GMHA) module to suppress the redundant attention information by normalizing the attention heads. In GMHA, each attention head has an independent gate to determine the passed attention value, thereby down-weighting the uninformative heads. The accuracy drop of the tensor-compressed DETR models can be mitigated by applying GMHA modules. Lastly, to obtain fully compressed DETR models, a low-bitwidth quantization technique is introduced for further reducing the model storage size. Based on the proposed methods, we can achieve significant parameter and model size reduction while maintaining high detection performance. We conduct extensive experiments on the COCO and PASCAL VOC datasets to validate the effectiveness of our tensor-compressed (tensorized) DETR models. The experimental results on the COCO benchmark show that we can attain <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3.7\times $ </tex-math></inline-formula> full model compression with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$482\times $ </tex-math></inline-formula> feed forward network (FFN) parameter reduction and only 0.6 points accuracy drop.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Toward Compact Transformers for End-to-End Object Detection With Decomposed Chain Tensor Structure

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Feb 1, 2023
Citations: 7

Similar Papers

Deeply Tensor Compressed Transformers for End-to-End Object Detection
Peining Zhen ... Hai-Bao Chen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Peining Zhen, et. al.Peining Zhen ... Hai-Bao Chen
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Towards Understanding Neural Machine Translation with Attention Heads’ Importance
Zijie Zhou ... Junguo Zhu
Applied Sciences | VOL. 14
Zijie Zhou, et. al.Zijie Zhou ... Junguo Zhu
27 Mar 2024
Applied Sciences | VOL. 14

ATICVis: A Visual Analytics System for Asymmetric Transformer Models Interpretation and Comparison
Jian-Lin Wu ... Ko-Chih Wang
Applied Sciences | VOL. 13
Jian-Lin Wu, et. al.Jian-Lin Wu ... Ko-Chih Wang
26 Jan 2023
Applied Sciences | VOL. 13

Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models
Shucong Zhang ... Steve Renals
-
Shucong Zhang, et. al.Shucong Zhang ... Steve Renals
30 Aug 2021
30 Aug 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Toward Compact Transformers for End-to-End Object Detection With Decomposed Chain Tensor Structure

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology