Abstract

Action recognition in realistic scenes is a challenging task in the field of computer vision. Although trajectory-based methods have demonstrated promising performance, background trajectories cannot be filtered out effectively, which leads to a reduction in the ratio of valid trajectories. To address this issue, we propose a saliency-based sampling strategy named foreground trajectories on multiscale hybrid masks (HM-FTs). First, the motion boundary images of each frame are calculated to derive the initial masks. According to the characteristics of action videos, image priors and the synchronous updating mechanism based on cellular automata are exploited to generate an optimized weak saliency map, which will be integrated with a strong saliency map obtained via the multiple kernels boosting algorithm. Then, multiscale hybrid masks are achieved through the collaborative optimization strategy and masks intersection. The compensation schemes are designed to extract a set of foreground trajectories that are closely related to human actions. Finally, a hybrid fusion framework for combining trajectory features and pose features is constructed to enhance the recognition performance. The experimental results on two benchmark datasets demonstrate that the proposed method is effective and improves upon most of the state-of-the-art algorithms.

Highlights

  • Human action recognition from videos is one of the research hotspots in the field of computer vision

  • To obtain the trajectories closely related to action subject and filter out the trajectories derived from the camera motion and inherent movements in the background, a saliency-based sampling strategy named foreground trajectories on multiscale hybrid masks (HM-FTs) is Downloaded From: https://www.spiedigitallibrary.org/journals/Journal-of-Electronic-Imaging on 28 Feb 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use proposed

  • According to the characteristics of action videos, a foreground region detection algorithm is presented using the weak saliency map optimized by the synchronous updating mechanism of cellular automata and the strong saliency map achieved through the multiple kernels boosting (MKB) method

Read more

Summary

Introduction

Human action recognition from videos is one of the research hotspots in the field of computer vision. The methods used to achieve action recognition are divided into two categories according to different feature types,[1] i.e., handcrafted methods and deep-learned methods. Yilmaz and Shah[10] exploited contour information to extract the three-dimensional (3-D) spatiotemporal volume (STV), and the peak point, valley point, and saddle point on the surface of STV are treated as the expression of human behaviors. Sadanand and Corso[5] generated cascaded features based on time-space pyramids, which are utilized as action representations to train a variety of templates and construct a behaviors warehouse named action bank.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.