Current information and communication technologies provide the infrastructure to transport bits anywhere, but do not indicate how to easily and precisely access and/or route information at the semantic level. To facilitate intelligent access to the rich multimedia data over the Internet, we develop an on-line knowledge- and rule-based video classification system that supports automatic “indexing” and “filtering” based on the semantic concept hierarchy. This paper investigates the use of video and audio content analysis, feature extraction and clustering techniques for further video semantic concept classification. A supervised rule-based video classification system is proposed using video automatic segmentation, annotation and summarization techniques for seamless information browsing and updating. In the proposed system, a real-time scene-change detection proxy performs an initial video-structuring process by splitting a video clip into scenes. Motional, visual and audio features are extracted in real-time for every detected scene by using on-line feature-extraction proxies. Higher semantics are then derived through a joint use of low-level features along with classification rules in the knowledge base. Classification rules are derived through a supervised learning process that relies on some representative samples from each semantic category. An indexing and filtering process can now be built using the semantic concept hierarchy to personalize multimedia data based on users’ interests. In real-time filtering, multiple video streams are blocked, combined, or sent to certain channels depending on whether or not the video streams are matched with the user's profile. We have extensively experimented and evaluated the classification and filtering techniques using basketball sports video data. In particular, in our experiment, the basketball video structure is examined and categorized into different classes according to distinct motional, visual and audio characteristics features by a rule-based classifier. The concept hierarchy describing the motional/visual/audio feature descriptors and their statistical relationships are reported in this paper along with detailed experimental results using on-line sports videos.
Read full abstract