Abstract

Fine-grained information has been proved helpful for person re-identification, and multi-head attention mechanism offers a feasible solution for this. However, we observe severe redundancy among the multiple branches, which might make the learned representation over-emphasize certain discriminative regions and correspondingly ignore other potentially informative regions. Therefore, we tackle this issue by two aspects yielding the so-called Complementation-Reinforced Attention Network (CRAN). One is the redundancy among branches, and we propose to impose complementing constraints among multiple attention heads. The constraints are two-fold: on the one hand, it encourages each branch to attend to complementary attention regions; on the other hand, it enforces orthogonality among the learned features of different regions in the embedding space. The other is the redundancy among query positions for each attention head. So we simplify the attention block by sparsifying the query positions. Besides, in order to achieve efficient retrieval, we propose an adaptive feature fusion method for dimensional reduction. Compared with the commonly used feature ensemble, our method effectively reduces the dimensionality while keeping the discriminative ability. We demonstrate the effectiveness of our method on MSMT17, Market-1501, DukeMTMC-reID, and CUHK03 datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call