Efficient Deep Learning Approach to Recognize Person Attributes by Using Hybrid Transformers for Surveillance Scenarios

S Raghavendra,Venu Madhav Nookala,S Kaliraj,S K Abhilash,Ramyashree Ramyashree

doi:10.1109/access.2023.3241334

S Raghavendra, Venu Madhav Nookala + Show 3 more

Open Access

https://doi.org/10.1109/access.2023.3241334

Copy DOI

Abstract

Numerous deep perception technologies and methods are built on the foundation of pedestrian feature identification. It covers various fields, including autonomous driving, spying, and object tracking. A recent study area is the identification of personality traits that has attracted much interest in video surveillance. Identifying a person’s distinct areas is complex and plays an incredibly significant role. This paper presents a current method applied to networks of primary convolutional neurons to locate the area connected to the Person attribute. Using Individual Feature Identification, the features of a person, such as gender, age, fashion sense, and equipment, have received much attention in video surveillance analytics. This Article adopted a Conv-Attentional image transformer that broke down the most discriminating Attribute and region into multiple grades. The feed-forward system and conv-attention are the components of serial blocks, and parallel blocks have two attention-focused tactics: direct cross-layer attention and feature interpolation. It also provides a flexible Attribute Localization Module (ALM) to learn the regional aspects of each Attribute are considered at several levels, and the most discriminating areas are selected adaptively. We draw the conclusion that hybrid transformers outperform pure transformers in this instance. The extensive experimental results indicate that the proposed hybrid technique achieves higher results than the current strategies on four unique private characteristic datasets, i.e., RapV2, RapV1, PETA, and PA100K.

Full Text