Abstract

Audio content analysis is helpful in many multimedia applications. We present a unified framework for content analysis of composite audio. The framework is designed to extract relevant information from different available audio modalities and to discover high-level semantics conveyed by the data. We also demonstrate an implementation of the proposed framework for the detection of scenes and events in various TV shows and movies, in which key audio effects are first extracted as a midlevel representation, and then a Bayesian network is used for high-level semantics inference. Experiments on 12-hour audio data indicate that the proposed framework has a satisfying performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call