Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Ashok Sarabu,Ajit Kumar Santra

doi:10.3390/data5040104

Ashok Sarabu, Ajit Kumar Santra

Open Access

https://doi.org/10.3390/data5040104

Copy DOI

Journal: Data	Publication Date: Nov 11, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: Vellore Institute of Technology University

Abstract

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Highlights

Human Action Recognition is an emerging research area that has gained prominent attention in computer vision
We propose a two-stream convolution neural network (CNN) model for identifying actions in videos built on a two-stream network model
We evaluate the experiments with Residual Networks (ResNet)-50 and Inception-V2 models to verify the efficiency of advanced cross-modal pre-training technique discussed in the previous section, as mentioned above

Summary

Introduction

Human Action Recognition is an emerging research area that has gained prominent attention in computer vision. The researchers for the aforementioned methods are able to utilize the temporal component, but work only for a short time; in lengthy videos, information cannot persist for a long time To solve this problem, Wang et al [6] designed a video level segmental architecture, called Temporal Segment Networks that can efficiently learns the features and retrieve the long-range time-varying features from the videos. The other methods proposed in [5,6,7,8,9,10,11], by researchers utilized similar network models for two streams for human action recognition in videos. Inspired by the human visual cortex process, we proposed similar two-stream CNN architecture for action recognition in videos. The segment based temporal modeling technique for long-term temporal information better captures long-range information

Related Works

Space-Time Networks

Hybrid Networks

Two-Stream Networks

Technical Approach

Distinct Two-Stream Convolution Networks

Base Networks

Residual Network

Inception-V2

Segment-Based Temporal Modeling

Data Augmentation

Advanced Cross-Modal Pre-Training

Experiments

Datasets and Implementation Detials

Testing

Exploration Study

Comparison with State-of-the-Art

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data

Lead the way for us

Similar Papers

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos
Xiao Liu ... Xudong Yang
-
Xiao Liu, et. al.Xiao Liu ... Xudong Yang
01 Jan 2018
01 Jan 2018

Analysis of CNN Architectures for Human Action Recognition in Video
David Silva ... Luis Gonzalez-Gurrola
Computación y Sistemas | VOL. 26
David Silva, et. al.David Silva ... Luis Gonzalez-Gurrola
30 Jun 2022
Computación y Sistemas | VOL. 26

Human action recognition in surveillance video of a computer laboratory
Abdul-Lateef Yussiff ... Yong Suet-Peng
-
Abdul-Lateef Yussiff, et. al.Abdul-Lateef Yussiff ... Yong Suet-Peng
01 Aug 2016
01 Aug 2016

Analysis of Deep Neural Networks for Human Activity Recognition in Videos—A Systematic Literature Review
Hadiqa Aman Ullah ... Fadratul Hafinaz Hassan
IEEE Access | VOL. 9
Hadiqa Aman Ullah, et. al.Hadiqa Aman Ullah ... Fadratul Hafinaz Hassan
01 Jan 2020
IEEE Access | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data