Abstract

Pedestrian Attribute Recognition has attracted increasing attention due to its wide range of potential applications. However, the pedestrian images are taken from a far distances significantly increase the difficulty of attribute feature extraction integrity. To solve these problems, and further improve the accuracy of pedestrian attribute recognition by convolutional neural networks, we proposed a method based on multi-scale feature fusion and cross attention. First, we use the ResNeXt50 structure to extract shallow and deep features, then fusion them for improving the recognition accuracy of attributes. Second, we adopt cross attention mechanism to generate pixel-level contextual information for enhancing the important feature information about attributes. Finally, we transmit the feature map into the fully connected layer for classification. Extensive experiments show that our proposed method could achieve state-of-the-art results on three pedestrian attribute datasets, including PA-100K, RAP, and PETA. The corresponding mean accuracies achieve 81.57%, 81.23% and 85.85%,respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call