Abstract

This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the variation of human posture. The attention masks were designed to focus on the partial and the whole-body regions. This research utilized an augmented layer for data augmentation inside the network to reduce over-fitting errors. Our network was evaluated in two datasets (RAP and PETA) with various backbone networks (ResNet-50, Inception V3, and Inception-ResNet V2). The experimental result shows that our network improves overall classification performance with a mean accuracy of about 2–3% in the same backbone network, especially local attributes and various human postures.

Highlights

  • Attention Mask for PedestrianNowadays, image analysis of a surveillance system has gained attention in a wide range of possible aspects

  • As mentioned in the introduction, this paper focused on an extension module to improve the attribute classification performance for the Pedestrian attribute recognition (PAR) network

  • This paper described the extended module for the PAR network with a soft attention module

Read more

Summary

Introduction

Image analysis of a surveillance system has gained attention in a wide range of possible aspects. Partial image classification was included in PAR to focus the local feature of each attribute and reduce the effects of image conditions. To be specific, this idea helps to reduce the region of interest (ROI). With the proposed soft attention mask, the attachment-attribute (e.g., backpack, hat, and so on) are visualized, and its local features can be extracted, as the backpack shown within a red circle of Figure 1. In case of missing skeleton data, holistic features extracted by a backbone network help to aid the human-part attention module. The proposed method presented a soft attention mask formulated by skeleton data, which is insensitive to variation in human posture. Besides local features from a soft attention model, features from the neighboring background regions are kept for handling various viewpoints and postures

Pedestrian Attribute Recognition
Visual Attention Model
Human Skeleton and Pose Estimation
Attention Mask
16. Right ankle
PAR Network Architecture
Backbone Network
Human-Part Attention Module
Classification Layers
Training Method
Human Attribute Augmentation
Dataset
Implementation Detail
Evaluation Metric
Overall Performance
Attribute-Level Performance
Time Complexity
Discussions
Surrounding Region
Occlusion
Irregular Human Posture
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call