Action Recognition Task Research Articles

The primary goal of this study is to develop a deep neural network for action recognition that enhances accuracy and minimizes computational costs. In this regard, we propose a modified EMO-MoviNet-A2* architecture that integrates Evolving Normalization (EvoNorm), Mish activation, and optimal frame selection to improve the accuracy and efficiency of action recognition tasks in videos. The asterisk notation indicates that this model also incorporates the stream buffer concept. The Mobile Video Network (MoviNet) is a member of the memory-efficient architectures discovered through Neural Architecture Search (NAS), which balances accuracy and efficiency by integrating spatial, temporal, and spatio-temporal operations. Our research implements the MoviNet model on the UCF101 and HMDB51 datasets, pre-trained on the kinetics dataset. Upon implementation on the UCF101 dataset, a generalization gap was observed, with the model performing better on the training set than on the testing set. To address this issue, we replaced batch normalization with EvoNorm, which unifies normalization and activation functions. Another area that required improvement was key-frame selection. We also developed a novel technique called Optimal Frame Selection (OFS) to identify key-frames within videos more effectively than random or densely frame selection methods. Combining OFS with Mish nonlinearity resulted in a 0.8-1% improvement in accuracy in our UCF101 20-classes experiment. The EMO-MoviNet-A2* model consumes 86% fewer FLOPs and approximately 90% fewer parameters on the UCF101 dataset, with a trade-off of 1-2% accuracy. Additionally, it achieves 5-7% higher accuracy on the HMDB51 dataset while requiring seven times fewer FLOPs and ten times fewer parameters compared to the reference model, Motion-Augmented RGB Stream (MARS).

The prevention of falls has become crucial in the modern healthcare domain and in society for improving ageing and supporting the daily activities of older people. Falling is mainly related to age and health problems such as muscle, cardiovascular, and locomotive syndrome weakness, etc. Among elderly people, the number of falls is increasing every year, and they can become life-threatening if detected too late. Most of the time, ageing people consume prescription medication after a fall and, in the Japanese community, the prevention of suicide attempts due to taking an overdose is urgent. Many researchers have been working to develop fall detection systems to observe and notify about falls in real-time using handcrafted features and machine learning approaches. Existing methods may face difficulties in achieving a satisfactory performance, such as limited robustness and generality, high computational complexity, light illuminations, data orientation, and camera view issues. We proposed a graph-based spatial-temporal convolutional and attention neural network (GSTCAN) with an attention model to overcome the current challenges and develop an advanced medical technology system. The spatial-temporal convolutional system has recently proven the power of its efficiency and effectiveness in various fields such as human activity recognition and text recognition tasks. In the procedure, we first calculated the motion along the consecutive frame, then constructed a graph and applied a graph-based spatial and temporal convolutional neural network to extract spatial and temporal contextual relationships among the joints. Then, an attention module selected channel-wise effective features. In the same procedure, we repeat it six times as a GSTCAN and then fed the spatial-temporal features to the network. Finally, we applied a softmax function as a classifier and achieved high accuracies of 99.93%, 99.74%, and 99.12% for ImViA, UR-Fall, and FDD datasets, respectively. The high-performance accuracy with three datasets proved the proposed system’s superiority, efficiency, and generality.

Action Recognition Task Research Articles

Related Topics

Articles published on Action Recognition Task

Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning

Improved ShuffleNetV2 for Action Recognition in BPPV Treatment

Phase Randomization: A data augmentation for domain adaptation in human action recognition

Self-Attention-Based Deep Convolution LSTM Framework for Sensor-Based Badminton Activity Recognition.

Scene adaptive mechanism for action recognition

EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

A Novel Multiperson Activity Recognition Algorithm Based on Point Clouds Measured by Millimeter-Wave MIMO Radar

Glimpse and focus: Global and local-scale graph convolution network for skeleton-based action recognition

AGPN: Action Granularity Pyramid Network for Video Action Recognition

ProtoHAR: Prototype Guided Personalized Federated Learning for Human Activity Recognition.

Cross-domain video action recognition via adaptive gradual learning

Dynamic Fall Detection Using Graph-Based Spatial Temporal Convolution and Attention Network

Video-Based Human Activity Recognition Using Deep Learning Approaches.

Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition

Novel Motion Patterns Matter for Practical Skeleton-Based Action Recognition

Human Action Recognition Using Key-Frame Attention-Based LSTM Networks

Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition.

Cross-domain few-shot action recognition with unlabeled videos

Human Activity Recognition in the Presence of Occlusion

EventMix: An efficient data augmentation strategy for event-based learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Action Recognition Task Research Articles

Related Topics

Articles published on Action Recognition Task

Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning

Improved ShuffleNetV2 for Action Recognition in BPPV Treatment

Phase Randomization: A data augmentation for domain adaptation in human action recognition

Self-Attention-Based Deep Convolution LSTM Framework for Sensor-Based Badminton Activity Recognition.

Scene adaptive mechanism for action recognition

EMO-MoviNet: Enhancing Action Recognition in Videos with EvoNorm, Mish Activation, and Optimal Frame Selection for Efficient Mobile Deployment.

A Novel Multiperson Activity Recognition Algorithm Based on Point Clouds Measured by Millimeter-Wave MIMO Radar

Glimpse and focus: Global and local-scale graph convolution network for skeleton-based action recognition

AGPN: Action Granularity Pyramid Network for Video Action Recognition

ProtoHAR: Prototype Guided Personalized Federated Learning for Human Activity Recognition.

Cross-domain video action recognition via adaptive gradual learning

Dynamic Fall Detection Using Graph-Based Spatial Temporal Convolution and Attention Network

Video-Based Human Activity Recognition Using Deep Learning Approaches.

Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition

Novel Motion Patterns Matter for Practical Skeleton-Based Action Recognition

Human Action Recognition Using Key-Frame Attention-Based LSTM Networks

Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition.

Cross-domain few-shot action recognition with unlabeled videos

Human Activity Recognition in the Presence of Occlusion

EventMix: An efficient data augmentation strategy for event-based learning