DynamoNet: Dynamic Action and Motion Network

Ali Diba,Luc Van Gool,Rainer Stiefelhagen,Vivek Sharma

doi:10.1109/iccv.2019.00629

Abstract

In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused on spatio-temporal approaches using standard filters, rather we here propose dynamic filters that adaptively learn the video-specific internal motion representation by predicting the short-term future frames. We name this new motion representation, as dynamic motion representation (DMR) and is embedded inside of 3D convolutional network as a new layer, which captures the visual appearance and motion dynamics throughout entire video clip via end-to-end network learning. Simultaneously, we utilize these motion representation to enrich video classification. We have designed the frame prediction task as an auxiliary task to empower the classification problem. With these overall objectives, to this end, we introduce a novel unified spatio-temporal 3D-CNN architecture (DynamoNet) that jointly optimizes the video classification and learning motion representation by predicting future frames as a multi-task learning problem. We conduct experiments on challenging human action datasets: Kinetics 400, UCF101, HMDB51. The experiments using the proposed DynamoNet show promising results on all the datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DynamoNet: Dynamic Action and Motion Network

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Do social cues in instructional videos affect attention allocation, perceived cognitive load, and learning outcomes under different visual complexity conditions?
Julius Meier ... Bastian De Jong
Journal of Computer Assisted Learning | VOL. 39
Julius Meier, et. al.Julius Meier ... Bastian De Jong
07 Mar 2023
Journal of Computer Assisted Learning | VOL. 39

The contrast between motion and appearance representation of STIP in human action classification
Hong-Bo Zhang ... Shao-Zi Li
-
Hong-Bo Zhang, et. al. Hong-Bo Zhang ... Shao-Zi Li
01 Jun 2011
01 Jun 2011

Self-supervised human semantic parsing for video-based person re-identification
Wei Wu ... Jiawei Liu
JUSTC | VOL. 52
Wei Wu, et. al.Wei Wu ... Jiawei Liu
01 Jan 2021
JUSTC | VOL. 52

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks
Hanbo Wu ... Xin Ma
International Journal of Advanced Robotic Systems | VOL. 16
Hanbo Wu, et. al.Hanbo Wu ... Xin Ma
01 Jan 2019
International Journal of Advanced Robotic Systems | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DynamoNet: Dynamic Action and Motion Network

Abstract

Talk to us

Similar Papers