Abstract
The rapid growth of multimedia data requires more effective content-based video browsing and retrieval. We present a system developed for video browsing and retrieval based on multimedia integration. First, a basic structure of the system is defined. Second, a robust scene segmentation method is presented, which analyzes audio and visual information and accounts for their interrelations and coincidence to semantically identify video scenes. We then extract text from key frames with video OCR technique and extract text transcriptions by speech recognition to classify video scenes and form the full-text indices. Finally, natural language understanding technique is used to automatically classify video scenes on the basis of the texts obtained from close caption, video OCR process and speech recognition. In this way, we have developed the content-based video database system which integrates multimodality to browse and retrieve video data. The experimental results show that multimodal integration is effective for video scene segmentation. Our system built on the idea of multimodal integration makes content-based browsing and retrieval of video data, key-frame-based video abstract and search by keywords practical.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.