Abstract

With the increasing growth of video data, especially in cyberspace, video captioning or the representation of video data in the form of natural language has been receiving an increasing amount of interest in several applications like video retrieval, action recognition, and video understanding, to name a few. In recent years, deep neural networks have been successfully applied for the task of video captioning. However, most existing methods describe a video clip using only one sentence that may not correctly cover the semantic content of the video clip. In this paper, a new multi-sentence video captioning algorithm is proposed using a content-oriented beam search approach and a multi-stage refining method. We use a new content-oriented beam search algorithm to update the probabilities of words generated by the trained deep networks. The proposed beam search algorithm leverages the high-level semantic information of an input video using an object detector and the structural dictionary of sentences. We also use a multi-stage refining approach to remove structurally wrong sentences as well as sentences that are less related to the semantic content of the video. To this intent, a new two-branch deep neural network is proposed to measure the relevance score between a sentence and a video. We evaluated the performance of the proposed method with two popular video captioning databases and compared the results with the results of some state-of-the-art approaches. The experiments showed the superior performance of the proposed algorithm. For instance, in the MSVD database, the proposed method shows an enhancement of 6% for the best-1 sentences in comparison to the best state-of-the-art alternative.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.