Abstract

Most of the existing methods manage to tackle the problem of text-based person retrieval from the spatial-wise perspective. In this paper, we manage to address the problem of this task from a novel perspective, namely, the frequency-wise perspective. To this end, we propose to Exploit Extreme and Smooth Signals via Omni-frequency learning (EESSO) through a jointly optimized multi-stream architecture. It consists of a Spatial Information Stream (SIS), an Extreme Signal Stream (ESS) and a Smooth Signal Stream (SSS). EESSO aims to excavate the complementary effect between spatial-wise and frequency-wise features, so as to achieve a superior performance. A novel Uncertainty-Guided Mutual Learning Mechanism (UG-MLM) is utilized during training, which not only enables the three streams to communicate with and learn from each other, but also models the data-related heteroscedastic uncertainty as a weight for knowledge transference, and hence enables each stream to adaptively allocate knowledge from the other two streams. A large number of experiments are carried out on the widely-used CUHK-PEDES, RSTPReid and ICFG-PEDES datasets to verify the effectiveness of EESSO. Through the experimental results, it may not be hard to find that EESSO has achieved the state-of-the-art performance in supervised, weakly supervised and cross-domain text-based person retrieval settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call