Abstract

This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction information are utilized for and scene boundary detection algorithm; this will be followed by scene classification. For audio, audio keysounds are sets of specific audio sounds related to semantic events and a classification method based on hidden Markov model (HMM) is used for audio keysound identification. Subsequently, by analyzing the multimodal information, the positions of potential semantic events, such as foul and shot at the basket, are located with additional domain knowledge. Finally, a video annotation is generated according to MPEG-7 multimedia description schemes (MDSs). Experimental results demonstrate the effectiveness of the proposed method.

Highlights

  • In recent years, with the remarkable increase of video data generated and distributed through networks, there is an evident need to develop an intelligent video browsing and indexing system

  • We address the problem of semantic basketball video analysis and annotation for MPEG compressed videos using multimodal information

  • Since the semantic understanding of video content is highly dependent on the utilization of contextual information and domain rules, a basketball video analysis and annotation method is proposed based on visual, motion, and audio information as well as domain-specific knowledge

Read more

Summary

INTRODUCTION

With the remarkable increase of video data generated and distributed through networks, there is an evident need to develop an intelligent video browsing and indexing system. We develop tools based on visual, motion, and audio information for analyzing and annotating basketball video using both low-level features and domain knowledge. Dynamic programming techniques were used to obtain the maximum likelihood play/break segmentation of the soccer video sequence at the symbol level These works demonstrated that HMM is an effective and efficient tool to represent continuous-time signals and discover structures in video content. To achieve detailed semantic basketball video analysis and annotation, we have combined the audio and motion features with other low-level features like color and texture Before ending this introduction, we list our main contributions: (1) motion-based scene boundary detection, (2) basketball scene classification based on visual and motion information, (3) HMM-based audio keysound detection, (4) high-level semantic inference and multimodal event detection, and (5) MPEG-7 standard compliant output for basketball video annotation.

MULTIMODAL BASKETBALL VIDEO ANALYSIS AND ANNOTATION
Shot and scene boundary detections
Scene classification
Audio keysound detection utilizing hidden Markov models
Sports Tennis
Our proposed hidden Markov model
Multimodal structure analysis and event detection
3: Out-of-court or close-up scene 4
MPEG-7 compliant annotation file generation
EXPERIMENTAL RESULT
Video shot and scene detection
Video scene classification
Audio keysound detection
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call