Abstract
AbstractExtracting robust feature representation is one of the key challenges for person re‐identification (ReID) task. Although convolution neural network (CNN)‐based methods have achieved great success, they still cannot handle the part occlusion and misalignment caused by limited receptive field. Recently, pure transformer models have shown its power in the person ReID task. However, current transformer models adopt patches of equal‐scale as input, and cannot solve the problem of cross‐scale interaction properly. To overcome this problem, an adaptive cross‐scale transformer from a perspective of the graph signal, named ACSFormer, is proposed. Specifically, the self‐attention module is first treated as an undirected fully connected graph. And then, “node variation” is introduced as an indicator to adaptively merge neighbourhood tokens. To the best of the authors’ knowledge, their ACSFormer is the first work to attempt to combine pure transformers and graph signal processing in the field of person ReID. Extensive evaluations are conducted on three person ReID datasets to validate the performance of ACSFormer. Experiments demonstrate that this ACSFormer performs on par with state‐of‐the‐art CNN‐based methods and consistently improves transformer‐based baseline, for example, surpassing ViT‐baseline by 2.5%, 2.7% and 4.8% mAP on Market1501, DukeMTMC‐reID and MSMT17, respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.