Abstract

Representing the features of different types of human action in unconstrained videos is a challenging task due to camera motion, cluttered background, and occlusions. This paper aims to obtain effective and compact action representation with length-variable edge trajectory (LV-ET) and spatio-temporal motion skeleton (STMS). First, in order to better describe the long-term motion information for action representation, a novel edge-based trajectory extracting strategy is introduced by tracking edge points from motion without limiting the length of trajectory; the end of the tracking is depending not only on the optical flow field but also on the optical flow vector position in the next frame. So, we only make use of a compact subset of action-related edge points in one frame to generate length-variable edge trajectories. Second, we observe that different types of action have their specific trajectory. A new trajectory descriptor named spatio-temporal motion skeleton is introduced; first, the LV-ET is encoded using both orientation and magnitude features and then the STMS is computed by motion clustering. Comparative experimental results with three unconstrained human action datasets demonstrate the effectiveness of our method.

Highlights

  • Human action recognition (HAR) is an active research topic in intelligent video analysis, gained extensive attention in academic and engineering communities [1, 2], and widely used in the fields of human-computer interaction, video surveillance, motion analysis, virtual reality, etc. [3,4,5,6,7]

  • Once we have extracted the length-variable edge trajectory (LV-ET), we can obtain the spatio-temporal motion skeleton (STMS) by applying the following algorithm: 4 Experimental results and discussion we evaluate the performance of the proposed LV-ET and STMS descriptor on three challenging unconstrained HAR datasets including UCF Sports [34], YouTube [35], and HMDB51 [36]; Fig. 8 shows some examples from these datasets

  • 5 Conclusions In this paper, a new trajectory generation strategy LV-ET is proposed and a novel descriptor STMS is designed for human action recognition

Read more

Summary

Introduction

Human action recognition (HAR) is an active research topic in intelligent video analysis, gained extensive attention in academic and engineering communities [1, 2], and widely used in the fields of human-computer interaction, video surveillance, motion analysis, virtual reality, etc. [3,4,5,6,7]. Human action recognition (HAR) is an active research topic in intelligent video analysis, gained extensive attention in academic and engineering communities [1, 2], and widely used in the fields of human-computer interaction, video surveillance, motion analysis, virtual reality, etc. The realization of HAR includes two steps: the first is feature extraction based on video information; the second is the classification according to feature vectors. The trajectory-based methods were proposed and utilized in various human action recognition. Previous studies ignore the motion feature of the tracking points and the differences between various types of actions. To address this issue, we propose the length-variable edge trajectory extracting method. We regard the various edge trajectories with analogous spatio-temporal and motion features as a set of

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call