Abstract
This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the variation of human posture. The attention masks were designed to focus on the partial and the whole-body regions. This research utilized an augmented layer for data augmentation inside the network to reduce over-fitting errors. Our network was evaluated in two datasets (RAP and PETA) with various backbone networks (ResNet-50, Inception V3, and Inception-ResNet V2). The experimental result shows that our network improves overall classification performance with a mean accuracy of about 2–3% in the same backbone network, especially local attributes and various human postures.
Highlights
Attention Mask for PedestrianNowadays, image analysis of a surveillance system has gained attention in a wide range of possible aspects
As mentioned in the introduction, this paper focused on an extension module to improve the attribute classification performance for the Pedestrian attribute recognition (PAR) network
This paper described the extended module for the PAR network with a soft attention module
Summary
Image analysis of a surveillance system has gained attention in a wide range of possible aspects. Partial image classification was included in PAR to focus the local feature of each attribute and reduce the effects of image conditions. To be specific, this idea helps to reduce the region of interest (ROI). With the proposed soft attention mask, the attachment-attribute (e.g., backpack, hat, and so on) are visualized, and its local features can be extracted, as the backpack shown within a red circle of Figure 1. In case of missing skeleton data, holistic features extracted by a backbone network help to aid the human-part attention module. The proposed method presented a soft attention mask formulated by skeleton data, which is insensitive to variation in human posture. Besides local features from a soft attention model, features from the neighboring background regions are kept for handling various viewpoints and postures
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have