Abstract

Video captioning algorithms aim at expressing the information and activities contained in a video clip in the form of lingual sentences. Most existing video captioning approaches have used only one sentence to describe the semantic content of a video. However, one sentence cannot transfer all the semantic information of a video, especially in videos with high informative content. Although a few studies have been conducted for multi-sentence video captioning, such as paragraph and dense captioning, they produce several sentences by focusing on different activities, objects, or temporal parts of a video. However, a video clip with a single object or activity may include a lot of information from different perspectives that can not be described by a single sentence, effectively. To counter the problem, we propose a multi-sentence video captioning algorithm using the spatial saliency of video frames as well as a content-oriented beam search algorithm. In the proposed algorithm, the spatial saliency of video frames is employed during the encoding stage to generate informative sentences by focusing on different parts of video frames. Furthermore, a content-oriented beam search algorithm is employed during the decoding stage to generate informative sentences. A multi-stage filter is also employed to remove the sentences with incorrect structure or sentences that are less relevant to the semantic content of the video. To evaluate the performance of the proposed algorithm, two well-known video description databases were used, and the results showed a significant improvement in the evaluation metrics, especially in the best-1 sentences. We also tested the proposed algorithm with several real-life movies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.