Abstract

With the development of computer technology, video description, which combines the key technologies in the field of natural language processing and computer vision, has attracted more and more researchers' attention. Among them, how to objectively and efficiently describe high-speed and detailed sports videos is the key to the development of the video description field. In view of the problems of sentence errors and loss of visual information in the generation of the video description text due to the lack of language learning information in the existing video description methods, a multihead model combining the long-term and short-term memory network and attention mechanism is proposed for the intelligent description of the volleyball video. Through the introduction of the attention mechanism, the model pays much attention to the significant areas in the video when generating sentences. Through the comparative experiment with different models, the results show that the model with the attention mechanism can effectively solve the loss of visual information. Compared with the LSTM and base model, the multihead model proposed in this paper, which combines the long-term and short-term memory network and attention mechanism, has higher scores in all evaluation indexes and significantly improved the quality of the intelligent text description of the volleyball video.

Highlights

  • With the continuous development of big data, computer computing power, and machine learning model, video description technology has set off a research upsurge again

  • Volleyball videos often present high-speed and detailed characteristics, which increase the difficulty of understanding the intelligent description of visual targets of video sensors [5]. erefore, a video sensor processing method combining the long-term and short-term memory network and attention mechanism is proposed for the intelligent description of the volleyball video. e introduction of the attention mechanism can make the model pay much attention to the significant areas in the image/video when generating sentences, quickly identify the target, and effectively solve the loss of visual information

  • Aiming at the problems of the lack of visual information, syntax error, and strong subjectivity in video description methods in existing video sensors, this paper proposes a method combining the long-term and short-term memory network and attention mechanism to describe the volleyball video

Read more

Summary

Research Article

Received 27 August 2021; Revised 25 September 2021; Accepted 27 September 2021; Published 14 October 2021. With the development of computer technology, video description, which combines the key technologies in the field of natural language processing and computer vision, has attracted more and more researchers’ attention. In view of the problems of sentence errors and loss of visual information in the generation of the video description text due to the lack of language learning information in the existing video description methods, a multihead model combining the long-term and shortterm memory network and attention mechanism is proposed for the intelligent description of the volleyball video. Compared with the LSTM and base model, the multihead model proposed in this paper, which combines the long-term and short-term memory network and attention mechanism, has higher scores in all evaluation indexes and significantly improved the quality of the intelligent text description of the volleyball video

Introduction
Computational Intelligence and Neuroscience
Decoding stage
Four women are playing
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call