Abstract
Artificial intelligence models are widely used in the field of human activity recognition, and human action recognition is an important aspect of human activity recognition. The core of human action recognition is to understand the temporal relationship between video frames. Almost all state-of-the-art methods of human action recognition in videos use optical flow. However, traditional local optical flow estimation methods areexpensive and not trained end-to-end. In this paper, we propose a fast network for human action recognition. Our purpose is to improve the efficiency of optical flow feature extraction and explore the fusion method of spatio-temporal features. For spatio-temporal features, our method combines spatial features and temporal features into fusion features. In addition, we propose CNN with OFF instead of the VGG16 network, which is used to process optical flow features to obtain abundant features. Our model only needs RGB inputs to get the state-of-the-art accuracy of 91.5% on UCF-101, 67.9% on HMDB51, 83.3% on MSR Daily Activity3D, and 91.25% on Florence 3D action, respectively. Compared with most state-of-the-art video action recognition models, our proposed model can effectively improve the accuracy of human action recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.