Hybrid Attention Network for Language-Based Person Search.

Yang Li,Junsheng Xiao,Huahu Xu

doi:10.3390/s20185279

Yang Li, Junsheng Xiao + Show 1 more

Open Access

https://doi.org/10.3390/s20185279

Copy DOI

Abstract

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.

Highlights

In today’s society, video surveillance has become an important means of public security, and thousands of surveillance cameras have been installed in public places
Different from most other methods just generating both spatial attention and channel attention based on the highest convolution layer or same layer of the network [12], we proposed a cross-layer cubic attention mechanism which generates spatial attention based on the midlevel network and generates channel attention based on the high-level network, to fully leverage the spatial information and rich details of the midlevel network and rich semantics of the high-level network, so as to get better performance of this fine-grained task
Different from other attention mechanism which generates attention based onfor theanconv4 layer of ResNet50 and too abstract to provide sufficient detailspatial and spatial information effective spatial attention to methods generating both SA and CA based on the conv5 layer of ResNet50, we proposed a generates channel attention based on the conv5 layer of

Summary

Introduction

In today’s society, video surveillance has become an important means of public security, and thousands of surveillance cameras have been installed in public places. The automatic search of interested persons in large-scale video or image database has attracted the increasing attention of researchers. Person reID has great limitations, because it requires that at least one image of the target person can be obtained, but in some actual cases, it may not be able to obtain the image of the target person. In this case, the target person can only be searched in the surveillance video/image database based on the text language description of the target person’s appearance provided by the witness, which is called text-based person search (TBPS)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Sep 15, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Hybrid Attention Network for Language-Based Person Search.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

SI-SA GAN: A Generative Adversarial Network Combined With Spatial Information and Self-Attention for Removing Thin Cloud in Optical Remote Sensing Images
Juntao Liu ... Weimin Hou
IEEE Access | VOL. 10
Juntao Liu, et. al.Juntao Liu ... Weimin Hou
01 Jan 2021
IEEE Access | VOL. 10

Refined Answer Selection Method with Attentive Bidirectional Long Short-Term Memory Network and Self-Attention Mechanism for Intelligent Medical Service Robot
Deguang Wang ... Ye Liang
Applied Sciences | VOL. 13
Deguang Wang, et. al.Deguang Wang ... Ye Liang
26 Feb 2023
Applied Sciences | VOL. 13

Multi-Scale and spatial position-based channel attention network for crowd counting
Lin Wang ... Fengping Wang
Journal of Visual Communication and Image Representation | VOL. 90
Lin Wang, et. al.Lin Wang ... Fengping Wang
10 Dec 2022
Journal of Visual Communication and Image Representation | VOL. 90

Learning Attentional Recurrent Neural Network for Visual Tracking
Qiurui Wang ... Chun Yuan
IEEE Transactions on Multimedia | VOL. 21
Qiurui Wang, et. al.Qiurui Wang ... Chun Yuan
18 Sep 2018
IEEE Transactions on Multimedia | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Attention Network for Language-Based Person Search.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors