Abstract

Occluded person re-identification (Re-ID) is a challenging task, as pedestrians are often obstructed by various occlusions, such as non-pedestrian objects or non-target pedestrians. Previous methods have heavily relied on auxiliary models to obtain information in unoccluded regions, such as human pose estimation. However, these auxiliary models fall short in accounting for pedestrian occlusions, thereby leading to potential misrepresentations. In addition, some previous works learned feature representations from single images, ignoring the potential relations among samples. To address these issues, this paper introduces a Multi-Level Relation-Aware Transformer (MLRAT) model for occluded person Re-ID. This model mainly encompasses two novel modules: Patch-Level Relation-Aware (PLRA) and Sample-Level Relation-Aware (SLRA). PLRA learns fine-grained local features by modeling the structural relations between key patches, bypassing the dependency on auxiliary models. It adopts a model-free method to select key patches that have high semantic correlation with the final pedestrian representation. In particular, to alleviate the interference of occlusion, PLRA captures the structural relations among key patches via a two-layer Graph Convolution Network (GCN), effectively guiding the local feature fusion and learning. SLRA is designed to facilitate the model to learn discriminative features by modeling the relations among samples. Specifically, to mitigate noisy relations of irrelevant samples, we present a Relation-Aware Transformer (RAT) block to capture the relations among neighbors. Furthermore, to bridge the gap between training and testing phases, a self-distillation method is employed to transfer the sample-level relations captured by SLRA to the backbone. Extensive experiments are conducted on four occluded datasets, two partial datasets and two holistic datasets. The results show that the proposed MLRAT model significantly outperforms existing baselines on four occluded datasets, while maintains top performance on two partial datasets and two holistic datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call