Attention-in-Attention Networks for Surveillance Video Understanding in Internet of Things

Ning Xu,An-An Liu,Yu-Ting Su,Wei-Zhi Nie

doi:10.1109/jiot.2017.2779865

Abstract

In this paper, we propose an approach to generate the comprehensive video interpretation for the surveillance video understanding in Internet of Things. The key problem of many visual learning tasks is to adaptively select and fuse diverse and complimentary features for video representation. We design the attention-in-attention (AIA) network to hierarchically explore the attention fusion in an end-to-end manner, and demonstrate the value of this model on the multievent recognition and video captioning challenges. Particularly, it consists of multiple encoder attention modules (EAMs) and a fusion attention module (FAM). Each EAM aims to highlight the space-specific features by selecting the most salient visual features or semantic attributes and averages them into one attentive feature. The FAM can suppress or enhance the activation of multispace attentive features and adaptively co-embed them for comprehensive video representation. Then, one long short-term memory unit decodes the video representations to generate multiple event labels or video captions. This architecture is capable of: 1) adaptively learning the salient space-specific feature representation and 2) co-embedding multispace attentive features into one space for feature fusion. Experiments conducted on the surveillance video dataset (concurrent event dataset) and the popular video captioning datasets (Microsoft Research Video Description Corpus and MSR-Video to Text). It shows that the proposed AIA can achieve competitive performances against the state of the arts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Attention-in-Attention Networks for Surveillance Video Understanding in Internet of Things

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal

Lead the way for us

Journal: IEEE Internet of Things Journal	Publication Date: Oct 1, 2018
Citations: 61

Similar Papers

Video Captioning with Multi-Faceted Attention
Xiang Long ... Chuang Gan
Transactions of the Association for Computational Linguistics | VOL. 6
Xiang Long, et. al.Xiang Long ... Chuang Gan
01 Dec 2018
Transactions of the Association for Computational Linguistics | VOL. 6

Multistream hierarchical boundary network for video captioning
Thang Nguyen ... Shagan Sah
-
Thang Nguyen, et. al.Thang Nguyen ... Shagan Sah
01 Nov 2017
01 Nov 2017

CMGNet: Collaborative multi-modal graph network for video captioning
Qi Rao ... Linchao Zhu
Computer Vision and Image Understanding | VOL. 238
Qi Rao, et. al.Qi Rao ... Linchao Zhu
18 Oct 2023
Computer Vision and Image Understanding | VOL. 238

Real-time Arabic Video Captioning Using CNN and Transformer Networks Based on Parallel Implementation
Adel Jalal Yousif ... Mohammed H Al-Jammas
Diyala Journal of Engineering Sciences | VOL. -
Adel Jalal Yousif, et. al.Adel Jalal Yousif ... Mohammed H Al-Jammas
07 Mar 2024
Diyala Journal of Engineering Sciences | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention-in-Attention Networks for Surveillance Video Understanding in Internet of Things

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal