Abstract

Video-based human action recognition can understand human actions and behaviours in the video sequences, and has wide applications for health care, human-machine interaction and so on. Metric learning, which learns a similarity metric, plays an important role in human action recognition. However, learning a full-rank matrix is usually inefficient and easily leads to overfitting. In order to overcome the above issues, a common way is to impose the low-rank constraint on the learned matrix. This paper proposes a novel Cholesky decomposition based metric learning (CDML) method for effective video-based human action recognition. Firstly, the improved dense trajectories technique and the vector of locally aggregated descriptor (VLAD) are respectively used for feature detection and feature encoding. Then, considering the high dimensionality of VLAD features, we propose to learn a similarity matrix by taking advantage of Cholesky decomposition, which decomposes the matrix into the product between a lower triangular matrix and its symmetric matrix. Different from the traditional low-rank metric learning methods that explicitly adopt the low-rank constraint to learn the matrix, the proposed algorithm achieves such a constraint by controlling the rank of the lower triangular matrix, thus leading to high computational efficiency. Experimental results on the public video dataset show that the proposed method achieves the superior performance compared with several state-of-the-art methods.

Highlights

  • Video-based human action recognition aims to recognize and understand the actions and behaviours in the video sequences

  • In the health care domain, the human actions are automatically recognized in real time and the corresponding medical rescue can be provided timely based on video-based human action recognition

  • We propose a novel Cholesky decomposition based metric learning (CDML) method for effective video-based human action recognition

Read more

Summary

INTRODUCTION

Video-based human action recognition aims to recognize and understand the actions and behaviours in the video sequences. Several methods [9], [10], [28], [30] rely on deep learning to perform video-based action recognition. Wang et al [15] developed a novel angular loss for deep metric learning, which greatly improves the traditional triplet loss by imposing geometric constraints for triplets These methods suffer from the high computational complexity. We propose a novel Cholesky decomposition based metric learning (CDML) method for effective video-based human action recognition. Experimental results show that our method achieves the competitive performance compared with the state-of-thearts on the challenging video-based human action recognition dataset. Considering that VLDA features have high dimensionality, we further propose to learn a similarity metric to match two sample pairs based on Cholesky decomposition.

FEATURE EXTRACTION OF ACTION RECOGNITION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call