Abstract

Scene change detection is an essential step to automatic and content-based video indexing, retrieval, and browsing. In this paper, a robust scene change detection method is presented, which analyzes both audio and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of audio source into four types of semantic data such as silence, speech, music, and environmental sound. Speech data are further decomposed into different elements according to different speakers. Meanwhile, visual analysis partitions video source into shots. Results from single source segmentation are in some cases suboptimal. By combining visual and audio features, the scene extraction accuracy is enhanced, and more semantic segmentations are developed. Experimental results are proven to be appropriate for content-based video indexing and retrieval.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call