Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition

Xiangbo Shu,Rui Yan,Jiawen Yang,Yan Song

doi:10.1109/tcsvt.2022.3142771

Abstract

This work focuses on the task of elderly activity recognition, which is a challenging task due to the existence of individual actions and human-object interactions in elderly activities. Thus, we attempt to effectively aggregate the discriminative information of actions and interactions from both RGB videos and skeleton sequences by attentively fusing multi-modal features. Recently, some nonlinear multi-modal fusion approaches are proposed by utilizing nonlinear attention mechanism that is extended from Squeeze-and-Excitation Networks (SENet). Inspired by this, we propose a novel Expansion-Squeeze-Excitation Fusion Network (ESE-FN) to effectively address the problem of elderly activity recognition, which learns modal and channel-wise Expansion-Squeeze-Excitation (ESE) attentions for attentively fusing the multi-modal features in the modal and channel-wise ways. Specifically, ESE-FN firstly implements the modal-wise fusion with the Modal-wise ESE Attention (M-ESEA) to aggregate discriminative information in modal-wise way, and then implements the channel-wise fusion with the Channel-wise ESE Attention (C-ESEA) to aggregate the multi-channel discriminative information in channel-wise way (referring to <xref ref-type="fig" rid="fig1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Figure 1</xref> ). Furthermore, we design a new Multi-modal Loss (ML) to keep the consistency between the single-modal features and the fused multi-modal features by adding the penalty of difference between the minimum prediction losses on single modalities and the prediction loss on the fused modality. Finally, we conduct experiments on a largest-scale elderly activity dataset, i.e., ETRI-Activity3D (including 110,000+ videos, and 50+ categories), to demonstrate that the proposed ESE-FN achieves the best accuracy compared with the state-of-the-art methods. In addition, more extensive experimental results show that the proposed ESE-FN is also comparable to the other methods in terms of normal action recognition task.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Aug 1, 2022
Citations: 114

Similar Papers

Signal Alignment for Humanoid Skeletons via the Globally Optimal Reparameterization Algorithm
Thomas W Mitchel ... Sipu Ruan
-
Thomas W Mitchel, et. al.Thomas W Mitchel ... Sipu Ruan
01 Nov 2018
01 Nov 2018

SSRT: A Sequential Skeleton RGB Transformer to Recognize Fine-Grained Human-Object Interactions and Action Recognition
Akash Ghimire ... Hakil Kim
IEEE access : practical innovations, open solutions | VOL. 11
Akash Ghimire, et. al.Akash Ghimire ... Hakil Kim
01 Jan 2023
IEEE access : practical innovations, open solutions | VOL. 11

Action recognition by single stream convolutional neural networks: An approach using combined motion and static information
Sameera Ramasinghe ... Ranga Rodrigo
-
Sameera Ramasinghe, et. al.Sameera Ramasinghe ... Ranga Rodrigo
01 Nov 2015
01 Nov 2015

Recognizing Daily Activities from First-Person Videos with Multi-task Clustering
Yan Yan ... Nicu Sebe
-
Yan Yan, et. al.Yan Yan ... Nicu Sebe
01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society