Abstract

To achieve the satisfactory performance of human action recognition, a central task is to address the sub-action sharing problem, especially in similar action classes. Nevertheless, most existing convolutional neural network (CNN)-based action recognition algorithms uniformly divide video into frames and then randomly select the frames as inputs, ignoring the distinct characteristics among different frames. In recent years, depth videos have been increasingly used for action recognition, but most methods merely focus on the spatial information of the different actions without utilizing temporal information. In order to address these issues, a novel energy-guided temporal segmentation method is proposed here, and a multimodal fusion strategy is employed with the proposed segmentation method to construct an energy-guided temporal segmentation network (EGTSN). Specifically, the EGTSN had two parts: energy-guided video segmentation and a multimodal fusion heterogeneous CNN. The proposed solution was evaluated on a public large-scale NTU RGB+D dataset. Comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed network.

Highlights

  • As one of the most popular research topics in the field of computer vision, human action recognition has been widely utilized in human–computer interactions, virtual reality, intelligent monitoring, and video retrieval [1,2,3,4]

  • This paper proposes an optical flow extraction method based on depth video that is combined with the characteristics of depth data

  • We analyze the effect of multimodal fusion on similar actions

Read more

Summary

Introduction

As one of the most popular research topics in the field of computer vision, human action recognition has been widely utilized in human–computer interactions, virtual reality, intelligent monitoring, and video retrieval [1,2,3,4]. Though it has recently generated promising performance, video-based action recognition still faces many open challenges [5,6], such as the complexity of action scenes, intra-class differences, and inter-class similarities. Zhang et al [8] used a multi-stream neural network joint attribute learner to learn the semantic information between action attributes

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.