Rapid and accurate localization of target person via person re-identification is one of the important methods in intelligent security systems,but faces great challenges due to complex environmental changes. In order to build a model applicable to various complex scenes, deep learning-based person re-identification models have become a key research direction for researchers. The existing convolutional neural network-based or transformer-based deep learning models are still have certain shortcomings in the extraction of local details and are not able to cope well with realistic complex scenes. To address this problem, a method fusing convolutional neural network and transformer (FCAT) is proposed to enhance the transformer's attention to local detail information. This method mainly improves the transformer's ability to extract local detail features indirectly by embedding convolutional space attention and channel attention respectively to enhance the attention to important regions and important channel features in the image. Comparative ablation experiments on three publicly available person re-identification datasets demonstrate that FCAT achieves comparable results to existing methods on non-occluded datasets, and significantly improves performance on occluded datasets. The proposed model is also has a lightweight effect, and the inference speed is improved without increasing the amount of outer computation and model parameters.
Read full abstract