Abstract
Video captioning is the process of creating a natural language sentence that summarises the video's contents automatically. Modeling the video's effective temporal composition and effectively integrating that information into a plain language description are both required. It has a variety of applications, including assisting the visually impaired, video subtitling, and video surveillance, among others. Due to the advancement of deep learning in computer vision and natural language processing, there has been a surge in study in this area in recent years. Video captioning is the result of combining these two worlds of computer vision and natural language processing. In this study, we examine and analyse various strategies for addressing this issue, as well as benchmark datasets in terms of domains, repository size, and number of classes; and identify the benefits and drawbacks of various evaluation metrics such as BLEU, METEOR, CIDEr, SPICE, and ROUGE.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Next-Generation Computing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.