In person re-identification(Re-ID) tasks, using only convolutional neural network(CNNs) makes it challenging to fully exploit the global key information in person images, while using only the Transformer structure fails to adequately capture the multi-scale local detail information in person images. Therefore, to address the more challenging person Re-ID tasks, effectively integrating CNN and Transformer into an algorithmic framework capable of fusing multi-dimensional features has become one of the current focal points of research. Due to the ability of the attention mechanism to suppress the extraction of non-salient features by convolutional neural networks, and the capability of the dual-channel multi-scale mechanism to facilitate the extraction of finer-grained deep information by the Transformer, we propose a network Based on Efficient Attention Mechanism and Single-channel Dual-channel Fusion with Transformer Features Aggregation (ADT). First, we introduce the Convolutional Block Attention Module(CBAM) and Spatial Reconstruction Unit (SRU) into the CNN network to construct a Multi-dimensional Attention Features Re-fusion Network(MCF). By weighting the local features extracted in the convolutional network, attention to person detail features in the model is enhanced. Then, we propose the Single and Dual-channel Integrated Transformer Features Aggregator(SDT) and overlap it in MCF. This integration is designed to more effectively aggregate the multi-scale high-dimensional global features from each layer of the CNN network. Experimental results on the Market1501, DukeMTMC, MSMT17, and CUHK03 datasets demonstrate that our model significantly enriches features representation compared to most methods.
Read full abstract