ASNet: Auto-Augmented Siamese Neural Network for Action Recognition.

Yujia Zhang,Yasar Abbas Ur Rehman,Jingjing Xiong,Kwok-Wai Cheung,Lai-Man Po

doi:10.3390/s21144720

Abstract

Human action recognition methods in videos based on deep convolutional neural networks usually use random cropping or its variants for data augmentation. However, this traditional data augmentation approach may generate many non-informative samples (video patches covering only a small part of the foreground or only the background) that are not related to a specific action. These samples can be regarded as noisy samples with incorrect labels, which reduces the overall action recognition performance. In this paper, we attempt to mitigate the impact of noisy samples by proposing an Auto-augmented Siamese Neural Network (ASNet). In this framework, we propose backpropagating salient patches and randomly cropped samples in the same iteration to perform gradient compensation to alleviate the adverse gradient effects of non-informative samples. Salient patches refer to the samples containing critical information for human action recognition. The generation of salient patches is formulated as a Markov decision process, and a reinforcement learning agent called SPA (Salient Patch Agent) is introduced to extract patches in a weakly supervised manner without extra labels. Extensive experiments were conducted on two well-known datasets UCF-101 and HMDB-51 to verify the effectiveness of the proposed SPA and ASNet.

Highlights

Video-based human action recognition is one of the key tasks in video understanding
We addressed the issue of using random cropping methods for data augmentation in convolutional neural networks (CNN)-based video action recognition: noisy samples through random
We addressed the issue of using randomgenerating cropping methods for data augmentation cropping will adversely affect the performance of the trained action recognition model

Summary

Introduction

Video-based human action recognition is one of the key tasks in video understanding. It provides a wide range of applications [1,2,3,4,5] in intelligent surveillance, health care, human–computer interaction, robot learning, etc. It is found that the data augmentation methods based on random cropping often generate non-informative samples (video patches covering only a small part of the foreground or only the background). These samples can be considered as noisy samples with incorrect labels. CNN in context stream receives input from data augmentation based on random cropping, and the CNN in saliency stream receives salient patches from SPA. We addressed the issue of using random cropping methods for data augmentation in CNN-based video action recognition: noisy samples through random. We proposed a Siamese neural network architecture that can reduce the negative cropping will adversely affect the performance of the trained action recognition impact of non-informative samples through gradient compensation and enhance model.

Deep Learning-Based Action Recognition

Data Augmentation

Saliency Detection for Action Recognition

Deep Reinforcement Learning in Action Recognition

ASNet Framework

Model Formulation

Salient Patch Agent

State and Action Space

Reward

Training of Salient Patch Agent

Datasets

Training of CNN

Training of ASNet

Inference Details

Comparison with Different Cropping Strategies

ASNet with Different Backbones

ASNet with Different Feature Fusion Strategies

Hyperparameters

Analysis of ASNet

Exploration of ASNet Architecture

Visualization of ASNet

Comparison with the State of the Art

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Jul 10, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ASNet: Auto-Augmented Siamese Neural Network for Action Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Graph Convolutional Neural Network for Human Action Recognition: A Comprehensive Survey
Tasweer Ahmad ... Lianwen Jin
IEEE transactions on artificial intelligence | VOL. 2
Tasweer Ahmad, et. al.Tasweer Ahmad ... Lianwen Jin
01 Apr 2021
IEEE transactions on artificial intelligence | VOL. 2

Action Recognition From Thermal Videos
Ganbayar Batchuluun ... Dat Tien Nguyen
IEEE access : practical innovations, open solutions | VOL. 7
Ganbayar Batchuluun, et. al.Ganbayar Batchuluun ... Dat Tien Nguyen
01 Jan 2019
IEEE access : practical innovations, open solutions | VOL. 7

Multi-modal human action recognition using deep neural networks fusing image and inertial sensor data
Inhwan Hwang ... Songhwai Oh
-
Inhwan Hwang, et. al.Inhwan Hwang ... Songhwai Oh
01 Nov 2017
01 Nov 2017

Deep Wavelet Convolutional Neural Networks for Multimodal Human Activity Recognition Using Wearable Inertial Sensors.
Thi Hong Vuong ... Tung Doan
Sensors (Basel, Switzerland) | VOL. 23
Thi Hong Vuong, et. al.Thi Hong Vuong ... Tung Doan
09 Dec 2023
Sensors (Basel, Switzerland) | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ASNet: Auto-Augmented Siamese Neural Network for Action Recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)