Spatio-Temporal Representation Matching-Based Open-Set Action Recognition by Joint Learning of Motion and Appearance

Yongsang Yoon,Jongmin Yu,Moongu Jeon

doi:10.1109/access.2019.2953455

Abstract

In this paper, we propose the spatio-temporal representation matching (STRM) for video-based action recognition under the open-set condition. Open-set action recognition is a more challenging problem than closed-set action recognition since samples of the untrained action class need to be recognized and most of the conventional frameworks are likely to give a false prediction. To handle the untrained action classes, we propose STRM, which involves jointly learning both motion and appearance. STRM extracts spatio-temporal representations from video clips through a joint learning pipeline with both motion and appearance information. Then, STRM computes the similarities between the ST-representations to find the one with highest similarity. We set the experimental protocol for open-set action recognition and carried out experiments on UCF101 and HMDB51 to evaluate STRM. We first investigated the effects of different hyper-parameter settings on STRM, and then compared its performance with existing state-of-the-art methods. The experimental results showed that the proposed method not only outperformed existing methods under the open-set condition, but also provided comparable performance to the state-of-the-art methods under the closed-set condition.

Highlights

Action recognition is one of the most challenging aspects of computer vision research, because the complexity and variety of human behaviors makes recognition difficult
We propose a spatio-temporal representation (ST-representation) matching (STRM) method based on joint learning of motion and appearance
Several recent methods have focused on modeling a long-range temporal structure using combination of 2D convolution and Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) such as [34], [40]

Summary

INTRODUCTION

Action recognition is one of the most challenging aspects of computer vision research, because the complexity and variety of human behaviors makes recognition difficult. Y. Yoon et al.: STRM-Based Open-Set Action Recognition by Joint Learning of Motion and Appearance object detection [21], [22] image classification [23], [24], and pose estimation [25]–[28]. Difficult problem in itself because of the complexity and variability of human actions, the open-set condition makes action recognition even harder because it contains the unconfined action category. To resolve this issue, we propose a spatio-temporal representation (ST-representation) matching (STRM) method based on joint learning of motion and appearance. The open-set action recognition process using STRM is as follows: Initially, STRM extracts joint spatiotemporal representa- tions (joint ST-representations) from a given video.

RELATED WORKS

JOINT SPATIO-TEMPORAL REPRESENTATION EXTRACTION

11: Select action class where the highest similarity si belongs to

LEARNING STRM

EXPERIMENTS

EXPERIMENTAL SETTING AND DATASET

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2019
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Spatio-Temporal Representation Matching-Based Open-Set Action Recognition by Joint Learning of Motion and Appearance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Joint representation learning of appearance and motion for abnormal event detection
Jongmin Yu ... Kin Choong Yow
Machine vision and applications | VOL. 29
Jongmin Yu, et. al.Jongmin Yu ... Kin Choong Yow
26 Jul 2018
Machine vision and applications | VOL. 29

Ant Colony-Based Hyperparameter Optimisation in Total Variation Reconstruction in X-ray Computed Tomography.
Manasavee Lohvithee ... Stephane Chretien
Sensors (Basel, Switzerland) | VOL. 21
Manasavee Lohvithee, et. al.Manasavee Lohvithee ... Stephane Chretien
15 Jan 2021
Sensors (Basel, Switzerland) | VOL. 21

Action recognition by joint learning
Yuan Yuan ... Xiaoqiang Lu
Image and Vision Computing | VOL. 55
Yuan Yuan, et. al.Yuan Yuan ... Xiaoqiang Lu
15 Apr 2016
Image and Vision Computing | VOL. 55

Action recognition based on discrete cosine transform by optical pixel-wise encoding
Yu Liang ... Hongwei Chen
APL Photonics | VOL. 7
Yu Liang, et. al.Yu Liang ... Hongwei Chen
01 Nov 2022
APL Photonics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spatio-Temporal Representation Matching-Based Open-Set Action Recognition by Joint Learning of Motion and Appearance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions