Abstract

Based on statistics theory, a generic framework for video semantic content analysis is proposed in this paper. Multilayer semantic analysis and multimodal information fusion are unified in the same modal. Firstly, frame-segment key-frame strategy and attention selection model are used to concisely represent video content. With pattern classification technique, the basic visual semantics are recognized. Then, a multilayer structure modal is used to extract multi-level visual semantics. After that, an audio semantic analysis scheme is presented with the spectrum feature extracted by Fourier transform algorithm. Finally, a bionic multimodal fusion method with two level structures for video semantic concept analysis is proposed. Experiment results demonstrate the framework could fuse multimodal feature, extract semantic in different granularity and bridge semantic gap to some extent.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call