Abstract

AbstractExtracting robust feature representation is one of the key challenges for person re‐identification (ReID) task. Although convolution neural network (CNN)‐based methods have achieved great success, they still cannot handle the part occlusion and misalignment caused by limited receptive field. Recently, pure transformer models have shown its power in the person ReID task. However, current transformer models adopt patches of equal‐scale as input, and cannot solve the problem of cross‐scale interaction properly. To overcome this problem, an adaptive cross‐scale transformer from a perspective of the graph signal, named ACSFormer, is proposed. Specifically, the self‐attention module is first treated as an undirected fully connected graph. And then, “node variation” is introduced as an indicator to adaptively merge neighbourhood tokens. To the best of the authors’ knowledge, their ACSFormer is the first work to attempt to combine pure transformers and graph signal processing in the field of person ReID. Extensive evaluations are conducted on three person ReID datasets to validate the performance of ACSFormer. Experiments demonstrate that this ACSFormer performs on par with state‐of‐the‐art CNN‐based methods and consistently improves transformer‐based baseline, for example, surpassing ViT‐baseline by 2.5%, 2.7% and 4.8% mAP on Market1501, DukeMTMC‐reID and MSMT17, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call