Human Action Recognition In Videos Research Articles

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition.

Read full abstract

Generally, the action recognition task requires a vast amount of labeled data, which represents a time-consuming human annotation effort. To mitigate the dependency on labeled data, this study proposes Semi-Supervised and Iterative Reinforcement Learning (RL-SSI), which adapts a supervised approach that uses 100% labeled data to a semi-supervised and iterative approach using reinforcement learning for human action recognition in videos. The JIGSAWS and Breakfast datasets were used to evaluate the RL-SSI model, because they are commonly used in the action segmentation task. The same applies to the performance metrics used in this work-F-Score (F1) and Edit Score-which are commonly applied for such tasks. In JIGSAWS tests, we observed that the RL-SSI outperformed previously developed state-of-the-art techniques in all quantitative measures, while using only 65% of the labeled data. When analysing the Breakfast tests, we compared the effectiveness of RL-SSI with the results of the self-supervised technique called SSTDA. We have found that RL-SSI outperformed SSTDA with an accuracy of 66.44% versus 65.8%, but RL-SSI was surpassed by the F1@10 segmentation measure, which presented an accuracy of 67.33% versus 69.3% for SSTDA. Despite this, our experiment only used 55.8% of the labeled data, while SSTDA used 65%. We conclude that our approach outperformed equivalent supervised learning methods and is comparable to SSTDA, when evaluated on multiple datasets of human action recognition, proving to be an important innovative method to successfully building solutions to reduce the amount of fully labeled data, leveraging the work of human specialists in the task of data labeling of videos, and their respectives frames, for human action recognition, thus reducing the required resources to accomplish it.

Read full abstract

Human Action Recognition In Videos Research Articles

Related Topics

Articles published on Human Action Recognition In Videos

Deep Learning-based Human Action Recognition in Videos

Residual attention fusion network for video action recognition

Action class relation detection and classification across multiple video datasets

A heterogeneous two-stream network for human action recognition

A Jeap-BiLSTM Neural Network for Action Recognition

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

Human Action Recognition

Multi-level channel attention excitation network for human action recognition in videos

ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos

Action density based frame sampling for human action recognition in videos

Analysis of CNN Architectures for Human Action Recognition in Video

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos.

RL-SSI Model: Adapting a Supervised Learning Approach to a Semi-Supervised Approach for Human Action Recognition

Human interaction recognition framework based on interacting body part attention

A Novel Feature-Selection Method for Human Activity Recognition in Videos

Action Density Based Frame Sampling for Human Action Recognition in Videos

Learning Video Actions in Two Stream Recurrent Neural Network

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment.

Human action recognition using attention based LSTM network with dilated CNN features

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Human Action Recognition In Videos Research Articles

Related Topics

Articles published on Human Action Recognition In Videos

Deep Learning-based Human Action Recognition in Videos

Residual attention fusion network for video action recognition

Action class relation detection and classification across multiple video datasets

A heterogeneous two-stream network for human action recognition

A Jeap-BiLSTM Neural Network for Action Recognition

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

Human Action Recognition

Multi-level channel attention excitation network for human action recognition in videos

ViT-ReT: Vision and Recurrent Transformer Neural Networks for Human Activity Recognition in Videos

Action density based frame sampling for human action recognition in videos

Analysis of CNN Architectures for Human Action Recognition in Video

An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos.

RL-SSI Model: Adapting a Supervised Learning Approach to a Semi-Supervised Approach for Human Action Recognition

Human interaction recognition framework based on interacting body part attention

A Novel Feature-Selection Method for Human Activity Recognition in Videos

Action Density Based Frame Sampling for Human Action Recognition in Videos

Learning Video Actions in Two Stream Recurrent Neural Network

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment.

Human action recognition using attention based LSTM network with dilated CNN features