Abstract
In this paper, we conduct an in-depth study and analysis of sports video recognition by improved hidden Markov model. The feature module is a complex gesture recognition module based on hidden Markov model gesture features, which applies the hidden Markov model features to gesture recognition and performs the recognition of complex gestures made by combining simple gestures based on simple gesture recognition. The combination of the two modules forms the overall technology of this paper, which can be applied to many scenarios, including some special scenarios with high-security levels that require real-time feedback and some public indoor scenarios, which can achieve different prevention and services for different age groups. With the increase of the depth of the feature extraction network, the experimental effect is enhanced; however, the two-dimensional convolutional neural network loses temporal information when extracting features, so the three-dimensional convolutional network is used in this paper to extract features from the video in time and space. Multiple binary classifications of the extracted features are performed to achieve the goal of multilabel classification. A multistream residual neural network is used to extract features from video data of three modalities, and the extracted feature vectors are fed into the attention mechanism network, then, the more critical information for video recognition is selected from a large amount of spatiotemporal information, further learning the temporal dependencies existing between consecutive video frames, and finally fusing the multistream network outputs to obtain the final prediction category. By training and optimizing the model in an end-to-end manner, recognition accuracies of 92.7% and 64.4% are achieved on the dataset, respectively.
Highlights
With the rapid development of computers, networks, and multimedia, and other related technologies, multimedia data has shown an exponential growth trend
The good or bad effect of simple gesture recognition determines the effect of complex gesture recognition, and this paper tested three complex gestures in different experimental environments when the volunteers stood in different positions under different scenarios
The average accuracy of complex gesture recognition is above 86%, which verifies the robustness of the complex gesture recognition technique proposed in this paper
Summary
With the rapid development of computers, networks, and multimedia, and other related technologies, multimedia data has shown an exponential growth trend. In the field of automatic video description research, automatic human action-based video analysis and understanding has gradually become a popular research problem in computer vision and pattern recognition in recent years [2] Faced with such a huge amount of video data, automatic video description can better manage and utilize these rich video resources and can help users improve the indexing speed and search quality of online videos, so that they can play a greater role. It has a wide application prospect in the fields of intelligent life assistance, advanced human-computer interaction, and content-based video retrieval and is closely followed by researchers at home and abroad. Feature extraction is a classical problem in the field of computer vision and machine learning, unlike feature extraction in image space, the feature representation of human action in the video describes how a person looks in image space and must extract the human appearance as well as posture changes, extending the feature extraction problem from two-dimensional space to three-dimensional space, which greatly increases the complexity of behavior mode expression and subsequent recognition tasks, while at the same time broadening the ideas for vision researchers in terms of solution ideas and techniques
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have