CNN with Embedding Transformers for Person Reidentification

Mo Jianwen,Yuan Hua,Chen Lingping,Mo Lunlin,Lin Leping,Michal Kawulok

doi:10.1155/2023/4591991

Abstract

For person reidentification (ReID), most slicing methods (such as part-based convolutional baseline (PCB) and AlignedReID) introduce a lot of background devoid of pedestrian parts, resulting in the cross-aliasing of features in the deep network. Besides, the resulting component features are not perfectly aligned with each other, thus affecting model performance. We propose a convolutional neural network (CNN) with embedding transformers (CET) person ReID network architecture based on the respective advantages of CNN and transformer. In CET, first, the residual transformer (RT) structure is first embedded in the backbone network of CNN to obtain a feature extractor, named transformers in CNN. The feature aliasing phenomenon is improved by utilizing transformer’s advantage in grasping the relevance of global information. Second, a feature fuse with learnable vector structure for fusing the output vector is added to the output of the transformer at the end of the network. A two branches loss structure is designed to balance the two different fusion strategies. Finally, the self-attention mechanism in transformer is used for automatic part alignment of human body parts to solve the part alignment problem caused by inaccurate detection frames. The experimental results show that CET network architecture achieves better performance than PCB and some other block-slicing methods.

Full Text