Abstract

• We propose an efficient hybrid ReID backbone for discriminative feature extraction. • We formulate an efficient PW-MSA block to free positions from the fixed length. • We put forward a novel attention-guided GCN model to encode person attributes and body parts into embedding representations. The Transformer has been applied into computer vision to explore long-range dependencies with multi-head self-attention strategy, therefore numerous Transformer-based methods for person re-identification (ReID) are designed for extracting effective as well as robust representation. However, the memory and computational complexity of scaled dot-product attention in Transformer cost vast overheads. To overcome these limitations, this paper presents ResT-ReID method, which designs a hybrid backbone Res-Transformer based on ResNet-50 and Transformer block for effective identify information. Specifically, we use global self-attention in place of depth-wise convolution in the fourth layer’s residual bottleneck of ResNet-50. For fully exploiting the entire knowledge of the person, we devise attention-guided Graph Convolution Networks (GCNs) with side information embedding (SIE-AGCN), which has an attention layer located into two GCN layers. The quantified experiments on two large-scale ReID benchmarks demonstrate that the proposed ResT-ReID achieves competitive results compared with several state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call