In order to effectively integrate multimodal information and multilayer constraints, we present a unified probabilistic framework for sports video analysis. Based the framework, three instances of statistical models are constructed and compared. Experimental results indicate our method with multimodal fusion processes semantic events in sports video more effectively. With the method based on statistics, we'll discuss further sport video content analysis method fusing multimode information. Semantic events in sport videos are in essence multimodal. In television relay, multimode information is integrative used to present video contents like subtitles, narrator's voices, on-site sounds, camera movements, scenarios and images etc. It is incomplete to analyze only one mode. For more effective analysis of events, it's required to study the analytical method which fuses multiple patterns. On the other hand, semantic events in those videos are not isolated. There's some logical or consequential relationship among them. In previous paper, we discussed event detection and recognition with the use of the contextual relationship based on dynamic Bayes network. Now on that basis, we'll explore how to fuse multimode information, which is a key issue we're facing here (1-2). In recent years, the fusion of multimode information has become a hot topic in the field of sport video analysis (3-5). Firstly we introduce the related work. Most of the multimode fusion analysis methods mentioned in previous literatures considered the detection of an isolated event. Unlike them, we propose to detect many events and analyze comprehensively the association among them. Multi-level analysis methods based on statistics are built on the probabilistic graphical model, such as Hidden Markov Model (HMM), dynamic Bayes network (DBN) and their variants. By combining visual graphical model representation and effective reasoning and learning methods, such solutions made fairly good effects. Xie (6-7) et al. applied hierarchical HMM for unsupervised clustering to discover layered structure of video contents. Differently, we introduced the method based on dynamic Bayes network model to do the same work. Through learning of training samples, we fulfilled the detection and recognition of wonderful events in the football match (8). Based on DBN, we developed a common sport video analysis framework. Compared with previous work mentioned above, our approach can not only integrate multimode information for event detection and also deal with hierarchical relationship among events. Although the two functions were stated in previous papers, they as a whole were not paid