Abstract
Since the appearance of most pedestrians is often obscured by various obstacles. Some existing works solve the occlusion problem by aligning the query image of the target pedestrian with the body part of the gallery image, but the body structure of the pedestrian is complicated and not easy to align. Therefore, this paper introduces a Human Feature Aggregation (HFA) approach based on Transformer without alignment, which uses pose information to separate the body parts of target pedestrians from the occlusion. This method utilizes pose information to separate the body parts of the target pedestrian from the obstructions. Initially, the Vision Transformer incorporates Convolutional Neural Network (CNN) advantages to enhance extraction more fine-grained global and local features. Subsequently, the body parts of the target pedestrian are separated from the obstructions using pose information extracted by a pose estimator. Finally, in the human feature aggregation module, local features are matched and fused with pose information to enrich the human features. It steers the model towards focus more on body parts. The experimental findings indicate that the proposed HFA approach surpasses alternative methods on multiple benchmark datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.