Abstract
The dual stream-based human action recognition model offers the advantage of high recognition accuracy, but the algorithm is less robust in case of lighting changes. The human skeleton has a strong ability to express human behavior and actions; however, the scene information is ignored. Drawing on the idea of the dual-stream model, this paper proposes a human skeleton and scene image-based dual-stream model for human action recognition. The motion features are extracted through the spatio-temporal graph convolution of the human skeleton, and a scene recognition model is proposed based on the sparse frame sampling of video and video-level consensus strategy to process the scene video and gather the visual scene information. The proposed model exploits the advantages of skeleton information in motion expression and the superiority of the image in scene presentation. The scene information and spatio-temporal graph convolution-based human skeleton limbs are fused complementarily to achieve human action recognition. Compared to the conventional optical flow-based dual-stream action recognition method, this model is verified by experimenting under unstable light conditions, and the performance of human action recognition is robust and promising.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.