Abstract

In this paper, we present a method for video semantic mining. Speech signal, video caption text and video frame images are all key factors for a person to understand the video content. Through above observation, we bring forward a method which integrating continuous speech recognition, video caption text recognition and object recognition. The video is firstly segmented to a serial of shots by shot detection. Then the caption text and speech recognition results are treated as two paragraphs of text. The object recognition results are presented by bag of words. The above three aspects of texts are processed by part of speech and stemming. Then only the noun words are kept. At last a video is represented by three bags of words. The words are further depicted as a graph. The graph vertices stand for the words and the edges denote the semantic distance between two neighboring words. In the last step, we apply the dense sub graph finding method to mine the video semantic meaning. Experiments show that our video semantic mining method is efficient.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.