Action Recognition In Videos Research Articles

The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human actions in video streams. While these algorithms have demonstrated impressive performance in activity recognition, they often exhibit a bias towards either model performance or computational efficiency. This biased trade-off between robustness and efficiency poses challenges when addressing complex human activity recognition problems. To address this issue, this paper presents a computationally efficient yet robust approach, exploiting saliency-aware spatial and temporal features for human action recognition in videos. To achieve effective representation of human actions, we propose an efficient approach called the dual-attentional Residual 3D Convolutional Neural Network (DA-R3DCNN). Our proposed method utilizes a unified channel-spatial attention mechanism, allowing it to efficiently extract significant human-centric features from video frames. By combining dual channel-spatial attention layers with residual 3D convolution layers, the network becomes more discerning in capturing spatial receptive fields containing objects within the feature maps. To assess the effectiveness and robustness of our proposed method, we have conducted extensive experiments on four well-established benchmark datasets for human action recognition. The quantitative results obtained validate the efficiency of our method, showcasing significant improvements in accuracy of up to 11% as compared to state-of-the-art human action recognition methods. Additionally, our evaluation of inference time reveals that the proposed method achieves up to a 74× improvement in frames per second (FPS) compared to existing approaches, thus showing the suitability and effectiveness of the proposed DA-R3DCNN for real-time human activity recognition.

In recent years, Internet of Things (IoT) has made rapid development, and IoT devices are developing towards intelligence. IoT terminal devices represented by surveillance cameras play an irreplaceable role in modern society, most of them are integrated with video action recognition and other intelligent functions. However, their performance is somewhat affected by the limitation of computing resources of IoT terminal devices and the lack of long-range non-linear temporal relation modeling and reverse motion information modeling. To address this urgent problem, we introduce a startling Temporal Transformer Network with Self-supervision (TTSN). Our high-performance TTSN mainly consists of a temporal transformer module and a temporal sequence self-supervision module. Concisely speaking, we utilize the efficient temporal transformer module to model the non-linear temporal dependencies among non-local frames, which significantly enhances complex motion feature representations. The temporal sequence self-supervision module we employ unprecedentedly adopts the streamlined strategy of “random batch random channel” to reverse the sequence of video frames, allowing robust extractions of motion information representation from inversed temporal dimensions and improving the generalization capability of the model. Extensive experiments on three widely used datasets (HMDB51, UCF101, and Something-something V1) have conclusively demonstrated that our proposed TTSN is promising as it successfully achieves state-of-the-art performance for video action recognition. Our TTSN provides the possibility for its application in IoT scenarios due to its computational complexity and high performance. With the rapid development of the Internet of Things (IoT), more and more data is being disseminated in the form of video, which also puts new requirements on the understanding and modeling of video data. In recent years, 2D Convolutional Networks-based video action recognition has encouragingly gained wide popularity; However, constrained by the lack of long-range non-linear temporal relation modeling and reverse motion information modeling, the performance of existing models is, therefore, undercut seriously. To address this urgent problem, we introduce a startling Temporal Transformer Network with Self-supervision (TTSN). Our high-performance TTSN mainly consists of a temporal transformer module and a temporal sequence self-supervision module. Concisely speaking, we utilize the efficient temporal transformer module to model the non-linear temporal dependencies among non-local frames, which significantly enhances complex motion feature representations. The temporal sequence self-supervision module we employ unprecedentedly adopts the streamlined strategy of “random batch random channel” to reverse the sequence of video frames, allowing robust extractions of motion information representation from inversed temporal dimensions and improving the generalization capability of the model. Extensive experiments on three widely used datasets (HMDB51, UCF101, and Something-something V1) have conclusively demonstrated that our proposed TTSN is promising as it successfully achieves state-of-the-art performance for video action recognition. As a result, our work provides new attention and self-supervised algorithm for processing video data in IoT.

Action Recognition In Videos Research Articles

Related Topics

Articles published on Action Recognition In Videos

Cross-modal alignment and translation for missing modality action recognition

Action class relation detection and classification across multiple video datasets

A heterogeneous two-stream network for human action recognition

A Jeap-BiLSTM Neural Network for Action Recognition

Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks

AGPN: Action Granularity Pyramid Network for Video Action Recognition

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

Cross-domain video action recognition via adaptive gradual learning

Temporal information oriented motion accumulation and selection network for RGB-based action recognition

Temporal Transformer Networks With Self-Supervision for Action Recognition

Human action recognition in complex live videos using graph convolutional network

Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN

Frequency Selective Augmentation for Video Representation Learning

SkateboardAI: The Coolest Video Action Recognition for Skateboarding (Student Abstract)

FSformer: Fast-Slow Transformer for video action recognition

Video action recognition: A survey

High speed human action recognition using a photonic reservoir computer

Enhanced bare-bones particle swarm optimization based evolving deep neural networks

Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Action Recognition In Videos Research Articles

Related Topics

Articles published on Action Recognition In Videos

Cross-modal alignment and translation for missing modality action recognition

Action class relation detection and classification across multiple video datasets

A heterogeneous two-stream network for human action recognition

A Jeap-BiLSTM Neural Network for Action Recognition

Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks

AGPN: Action Granularity Pyramid Network for Video Action Recognition

Human Action Representation Learning Using an Attention-Driven Residual 3DCNN Network

Cross-domain video action recognition via adaptive gradual learning

Temporal information oriented motion accumulation and selection network for RGB-based action recognition

Temporal Transformer Networks With Self-Supervision for Action Recognition

Human action recognition in complex live videos using graph convolutional network

Advancing human action recognition: A hybrid approach using attention-based LSTM and 3D CNN

Frequency Selective Augmentation for Video Representation Learning

SkateboardAI: The Coolest Video Action Recognition for Skateboarding (Student Abstract)

FSformer: Fast-Slow Transformer for video action recognition

Video action recognition: A survey

High speed human action recognition using a photonic reservoir computer

Enhanced bare-bones particle swarm optimization based evolving deep neural networks

Enhancing Zero-Shot Action Recognition in Videos by Combining GANs with Text and Images

Self-supervised pretext task collaborative multi-view contrastive learning for video action recognition