Camera-Aware Recurrent Learning and Earth Mover’s Test-Time Adaption for Generalizable Person Re-Identification

Kaixiang Chen,Liyan Zhang,Tiantian Gong

doi:10.1109/tcsvt.2023.3285046

Abstract

Domain generalization in person re-identification (ReID) aims to design a generalizable model, which is trained under the supervision of a set of labeled source domains and can be directly deployed on unknown domains. Existing approaches simply treat each identity as a distinct class and ignore the differences among cameras. We argue that the camera information is crucial for learning discriminative representations, as people’s behavior usually varies between cameras. In this paper, we present Multi-Centroid Memory (MCM) to capture different camera information for each identity and Soft Triple Hard (ST-Hard) loss to align the information of the same identity across cameras. Furthermore, in contrast to the traditional approaches of training a single model using a parallel training mechanism, we propose the Recurrent Implicit Lifelong Learning (RILL) that feeds the source domains into the model in a continuous loop to train an expert for each domain. To make each expert further generalized to other source domains, during the training on the current domain, RILL adopts a style replay-based method to simulate the training of the previous domain, encouraging each domain’s expert to extract generalizable features. We also present Earth Mover’s Test-time Adaption (EMTA) to be used in conjunction with RILL, which enables source domains that are more similar to the test domain to play a more significant role in the test. This is achieved by our proposed Earth Mover’s Similarity (EMS), which helps model the similarities between the source and test domains. Extensive experiments on two evaluation protocols fully demonstrate our framework’s generalization and competitiveness.

Full Text