Abstract

Since news videos are valuable sources of multimedia information on real-world events, there is a demand for viewing them efficiently. However, there is a problem that summarization methods based on auditory contents do not take into account the visual contents. In the case of news videos, due to its presentation style where audio contents and visual contents do not necessarily come from the same source, this could severely decrease the amount of informative visual contents included in the generated summarized video. Thus, we propose a method for summarizing a sequence of news videos considering the consistency of both auditory and visual contents. The proposed method first selects key-sentences from the auditory contents (Closed Caption) of each news story in the sequence, and then selects a shot within the news story whose Visual Concepts detected from the visual contents are the most consistent with the key-phrase. Finally, the audio segment corresponding to each key-phrase is overlapped onto the selected shot, and then concatenated to generate a summarized video. The effectiveness of the proposed method was confirmed on several news topics through a subjective experiment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.