An Attention-based Hybrid 2D/3D CNN-LSTM for Human Action Recognition

Khaled Bayoudh,Abdellatif Mtibaa,Faycal Hamdaoui

doi:10.1109/iccit52419.2022.9711631

Abstract

Human Action Recognition (HAR) is a challenging problem in computer vision that has received a great deal of attention in the last decade. With the advent of new deep learning techniques such as convolutional neural networks (CNNs), the recognition performance of HAR systems has improved significantly over traditional methods, mainly due to the powerful representation capabilities of CNNs. In most of the literature, 2D CNNs or their 3D counterparts have been used to learn spatial and temporal image-level features of videos. In this paper, we developed an end-to-end HAR framework based on a hybrid 2D/3D CNN. The hybrid CNN feature extractor aims to exploit the potential collaboration between 2D and 3D CNNs. The CNN features extracted from the video sequences are then fed into a Long Short-Term Memory (LSTM) network to capture the short- and long-term dependencies in the data structure. Inspired by human visual attention mechanisms, a visual attention module was used in this study to focus semantically on relevant salient features in visual representations. The developed model was trained and evaluated using the KTH dataset and achieved promising recognition performance compared to state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Attention-based Hybrid 2D/3D CNN-LSTM for Human Action Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Predicting lung nodule malignancies by combining deep convolutional neural network and handcrafted features
Shulong Li ... Liyuan Chen
Physics in Medicine & Biology | VOL. 64
Shulong Li, et. al.Shulong Li ... Liyuan Chen
01 Sep 2019
Physics in Medicine & Biology | VOL. 64

Learning deep features to recognise speech emotion using merged deep CNN
Jianfeng Zhao ... Lijiang Chen
IET Signal Processing | VOL. 12
Jianfeng Zhao, et. al.Jianfeng Zhao ... Lijiang Chen
01 Aug 2018
IET Signal Processing | VOL. 12

Retracted] Visual Sensing Human Motion Detection System for Interactive Music Teaching
Xunyun Chang ... Liangqing Peng
Journal of Sensors | VOL. 2021
Xunyun Chang, et. al.Xunyun Chang ... Liangqing Peng
01 Jan 2020
Journal of Sensors | VOL. 2021

Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs
Yutong Xie ... Xingyou Pan
Expert Systems with Applications | VOL. 217
Yutong Xie, et. al.Yutong Xie ... Xingyou Pan
24 Dec 2022
Expert Systems with Applications | VOL. 217

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Attention-based Hybrid 2D/3D CNN-LSTM for Human Action Recognition

Abstract

Talk to us

Similar Papers