Action Recognition Framework Research Articles

PurposeAssembly action recognition plays an important role in assembly process monitoring and human-robot collaborative assembly. Previous works overlook the interaction relationship between hands and operated objects and lack the modeling of subtle hand motions, which leads to a decline in accuracy for fine-grained action recognition. This paper aims to model the hand-object interactions and hand movements to realize high-accuracy assembly action recognition.Design/methodology/approachIn this paper, a novel multi-stream hand-object interaction network (MHOINet) is proposed for assembly action recognition. To learn the hand-object interaction relationship in assembly sequence, an interaction modeling network (IMN) comprising both geometric and visual modeling is exploited in the interaction stream. The former captures the spatial location relation of hand and interacted parts/tools according to their detected bounding boxes, and the latter focuses on mining the visual context of hand and object at pixel level through a position attention model. To model the hand movements, a temporal enhancement module (TEM) with multiple convolution kernels is developed in the hand stream, which captures the temporal dependences of hand sequences in short and long ranges. Finally, assembly action prediction is accomplished by merging the outputs of different streams through a weighted score-level fusion. A robotic arm component assembly dataset is created to evaluate the effectiveness of the proposed method.FindingsThe method can achieve the recognition accuracy of 97.31% and 95.32% for coarse and fine assembly actions, which outperforms other comparative methods. Experiments on human-robot collaboration prove that our method can be applied to industrial production.Originality/valueThe author proposes a novel framework for assembly action recognition, which simultaneously leverages the features of hands, objects and hand-object interactions. The TEM enhances the representation of dynamics of hands and facilitates the recognition of assembly actions with various time spans. The IMN learns the semantic information from hand-object interactions, which is significant for distinguishing fine assembly actions.

Read full abstract

Nowadays, for controlling crime, surveillance cameras are typically installed in all public places to ensure urban safety and security. However, automating Human Activity Recognition (HAR) using computer vision techniques faces several challenges such as lowlighting, complex spatiotemporal features, clutter backgrounds, and inefficient utilization of surveillance system resources. Existing attempts in HAR designed straightforward networks by analyzing either spatial or motion patterns resulting in limited performance while the dual streams methods are entirely based on Convolutional Neural Networks (CNN) that are inadequate to learning the long-range temporal information for HAR. To overcome the above-mentioned challenges, this paper proposes an optimized dual stream framework for HAR which mainly consists of three steps. First, a shots segmentation module is introduced in the proposed framework to efficiently utilize the surveillance system resources by enhancing the lowlight video stream and then it detects salient video frames that consist of human. This module is trained on our own challenging Lowlight Human Surveillance Dataset (LHSD) which consists of both normal and different levels of lowlighting data to recognize humans in complex uncertain environments. Next, to learn HAR from both contextual and motion information, a dual stream approach is used in the feature extraction. In the first stream, it freezes the learned weights of the backbone Vision Transformer (ViT) B-16 model to select the discriminative contextual information. In the second stream, ViT features are then fused with the intermediate encoder layers of FlowNet2 model for optical flow to extract a robust motion feature vector. Finally, a two stream Parallel Bidirectional Long Short-Term Memory (PBiLSTM) is proposed for sequence learning to capture the global semantics of activities, followed by Dual Stream Multi-Head Attention (DSMHA) with a late fusion strategy to optimize the huge features vector for accurate HAR. To assess the strength of the proposed framework, extensive empirical results are conducted on real-world surveillance scenarios and various benchmark HAR datasets that achieve 78.6285%, 96.0151%, and 98.875% accuracies on HMDB51, UCF101, and YouTube Action, respectively. Our results show that the proposed strategy outperforms State-of-the-Art (SOTA) methods. The proposed framework gives superior performance in HAR, providing accurate and reliable recognition of human activities in surveillance systems.

Read full abstract

Action Recognition Framework Research Articles

Related Topics

Articles published on Action Recognition Framework

Hypergraph-Based Multi-View Action Recognition Using Event Cameras.

A Novel Active Learning Framework for Cross-Subject Human Activity Recognition from Surface Electromyography.

A novel multi-stream hand-object interaction network for assembly action recognition

Data augmentation aided excavator activity recognition using deep convolutional conditional generative adversarial networks

Hybrid attentive prototypical network for few-shot action recognition

FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

An end-to-end hand action recognition framework based on cross-time mechanomyography signals

AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

Human action recognition with transformer based on convolutional features

Temporal cues enhanced multimodal learning for action recognition in RGB-D videos

Multi-head CNN-based activity recognition and its application on chest-mounted sensor-belt

AP-TransNet: a polarized transformer based aerial human action recognition framework

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Shots segmentation-based optimized dual-stream framework for robust human activity recognition in surveillance video

Residual deep gated recurrent unit-based attention framework for human activity recognition by exploiting dilated features

A human activity recognition framework in videos using segmented human subject focus

Subsampled Randomized Hadamard Transformation-based Ensemble Extreme Learning Machine for Human Activity Recognition

E-BabyNet: Enhanced Action Recognition of Infant Reaching in Unconstrained Environments.

SecureSense: Defending Adversarial Attack for Secure Device-Free Human Activity Recognition

Federated Learning Framework for Human Activity Recognition Using Smartphones

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Action Recognition Framework Research Articles

Related Topics

Articles published on Action Recognition Framework

Hypergraph-Based Multi-View Action Recognition Using Event Cameras.

A Novel Active Learning Framework for Cross-Subject Human Activity Recognition from Surface Electromyography.

A novel multi-stream hand-object interaction network for assembly action recognition

Data augmentation aided excavator activity recognition using deep convolutional conditional generative adversarial networks

Hybrid attentive prototypical network for few-shot action recognition

FTAN: Frame-to-frame temporal alignment network with contrastive learning for few-shot action recognition

An end-to-end hand action recognition framework based on cross-time mechanomyography signals

AGAR - Attention Graph-RNN for Adaptative Motion Prediction of Point Clouds of Deformable Objects

Human action recognition with transformer based on convolutional features

Temporal cues enhanced multimodal learning for action recognition in RGB-D videos

Multi-head CNN-based activity recognition and its application on chest-mounted sensor-belt

AP-TransNet: a polarized transformer based aerial human action recognition framework

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Shots segmentation-based optimized dual-stream framework for robust human activity recognition in surveillance video

Residual deep gated recurrent unit-based attention framework for human activity recognition by exploiting dilated features

A human activity recognition framework in videos using segmented human subject focus

Subsampled Randomized Hadamard Transformation-based Ensemble Extreme Learning Machine for Human Activity Recognition

E-BabyNet: Enhanced Action Recognition of Infant Reaching in Unconstrained Environments.

SecureSense: Defending Adversarial Attack for Secure Device-Free Human Activity Recognition

Federated Learning Framework for Human Activity Recognition Using Smartphones