TV program segmentation raised as a major topic in the last decade for the task of high quality indexing of multimedia content. Earlier studies of TV program segmentation are either highly supervised (e.g., event detection) or too specific to a certain type of program (e.g., cluster-based methods), which is not practically usable for indexing tasks because of the lack of generality of programs types. In this paper, we address the problem of unsupervised TV program segmentation by leveraging grammatical inference, i.e., discovering a common structural model shared by a collection of episodes of a recurrent TV program by finding an optimal alignment of structural elements across episodes. Structural elements referring to a video segment with a particular syntactic meaning with respect to the video structure. The use of symbolic representation of structural elements makes grammatical inference feasible to be applied on TV program modeling, and makes TV program segmentation possible to rely on only minimal domain knowledge. The proposed approach is operated in two phases. The first phase aims at obtaining a symbolic representation of each episode, where the elements relevant to the structure are discovered based on recurrence mining. The second phase is that of grammatical inference from the symbolic representation of episodes. We investigate two inference techniques, one based on multiple sequence alignment and one relying on uniform resampling, to infer structural grammars for TV programs. A model of the structure is derived from the structural grammars and used to predict the structure of new episodes. Comparative evaluation on two grammar inference approaches demonstrates that the models obtained can reflect the structure of the program and predict the structure of unseen episodes, which is the main application of the proposed approach in industry, i.e., to assist librarians for segmentation tasks.
Read full abstract