Abstract

Human action recognition in realistic videos is an important and challenging task. Recent studies demonstrate that multi-feature fusion can significantly improve the classification performance for human action recognition. Therefore, a number of researches utilize fusion strategies to combine multiple features and achieve promising results. Nevertheless, previous fusion strategies ignore the correlations of different action categories. To address this issue, we propose a novel multi-feature fusion framework, which utilizes the correlations of different action categories and multiple features. To describe human actions, this framework combines several classical features, which are extracted with deep convolutional neural networks and improved dense trajectories. Moreover, massive experiments are conducted on two challenging datasets to evaluate the effectiveness of our approach, and the proposed approach obtains the state-of-the-art classification accuracy of 68.1 % and 93.3 % on the HMDB51 and UCF101 datasets, respectively. Furthermore, the proposed approach achieves better performances than five classical fusion schemes, as the correlations are used to combine multiple features in this framework. To the best of our knowledge, this work is the first attempt to learn the correlations of different action categories for multi-feature fusion.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.