Abstract
The field of person re-identification (re-ID) encounters challenges such as inter-person similarity and occlusion. In this paper, we introduce an occluded pedestrian re-ID method based on Multiple Fusion and Semantic feature Mining (MFSM) to mitigate the adverse effects of occlusion on feature representation. Specifically, we devise a Multi-branch Strategic Enhancement Module (SEM) to bolster the robustness of data augmentation in addressing the issues posed by data memorization more consistently. This module emulates various real-world disturbances by implementing distinct and independent augmentation strategies. Furthermore, we propose the Triplet Cross Unit (TCU) to comprehensively exploit and consolidate both regional and relational visual cues, as well as multi-level features. The TCU facilitates transformers in acquiring an early understanding of translation invariance in images by transferring local patterns such as edges, textures, and colors from the lower CNN layers to the shallower transformer layers. Simultaneously, deeper transformer features offer more abstracted semantic visual representations that complement CNN's high-level semantic features. Lastly, we introduce the Global Squeeze-Excitation Fusion (GSEF) module to address the challenge of global feature segregation across different models in final outputs. The GSEF selectively merges features based on attention mechanisms, prioritizing the comprehensive utilization of valid global features. Extensive experiments demonstrate the state-of-the-art performance of our model in handling pedestrian occlusion, thus validating the efficacy of our method.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have