A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition

Xinhua Cheng,Qian Wang,Mengxi Jia,Jian Zhang

doi:10.1109/tcsvt.2022.3178144

Abstract

Pedestrian attribute recognition (PAR), which aims to identify attributes of the pedestrians captured in video surveillance, is a challenging task due to the poor quality of images and diverse spatial distribution among attributes. Existing methods usually model PAR as a multi-label classification problem and manually map attributes to an ordered list corresponding to the outputs of classifiers or sequential models. However, the inherent textual information among attribute annotations is largely neglected in these visual-only methods. In this paper, we first alleviate this issue by proposing a novel visual-textual baseline (VTB) for PAR which introduces an additional textual modality to explore the textual semantic correlations from attribute annotations by pre-trained textual encoders instead of human definitions. VTB encodes pedestrian images and attribute annotations into visual and textual features respectively, interacts with information across modalities, and predicts recognition results independently to remove the influence of attribute orders. Furthermore, we introduce transformer encoder as the cross-modal fusion module in VTB for sufficient intra-modal and cross-modal correlations exploration. Our method achieves superior performance over most existing visual-only methods on two widely used datasets including RAP and PA-100K, demonstrating the effectiveness of utilizing textual modality to PAR. Our method is expected to serve as a multimodal PAR baseline and inspire new insights for multimodal fusion in future PAR research. Our code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/cxh0519/VTB</uri> .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Oct 1, 2022
Citations: 15

Similar Papers

Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition.
Yang Li ... Junsheng Xiao
Sensors | VOL. 20
Yang Li, et. al.Yang Li ... Junsheng Xiao
03 Feb 2020
Sensors | VOL. 20

Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition
Junyi Wu ... Yuzhen Niu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Junyi Wu, et. al.Junyi Wu ... Yuzhen Niu
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Reinforced pedestrian attribute recognition with group optimization reward
Zhong Ji ... Yanwei Pang
Image and Vision Computing | VOL. 128
Zhong Ji, et. al.Zhong Ji ... Yanwei Pang
01 Dec 2022
Image and Vision Computing | VOL. 128

Pedestrian Attribute Recognition with Part-based CNN and Combined Feature Representations
Yiqiang Chen ... Atilla Baskurt
-
Yiqiang Chen, et. al.Yiqiang Chen ... Atilla Baskurt
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Simple Visual-Textual Baseline for Pedestrian Attribute Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology