Towards Theoretical Performance Limits of Video Parsing

Alan Hanjalic

doi:10.1109/tcsvt.2007.890833

Abstract

This paper unravels the problem of temporal video segmentation, or video parsing, and explores the possibilities for defining theoretical limits for the expected performance of a general parsing algorithm. In particular, we address the challenge of computing the coherence of video content, which is critical to the ability of an algorithm to parse a video automatically. If this coherence is difficult to extract from video data, it is unrealistic to expect that any parsing algorithm applied to that data will perform optimally with respect to the ground truth, independent of the features and approach used. The measure of coherence computability (CC) we introduce in this paper is derived from the average uncertainty in extracting the content-related information from data, which translates into the uncertainty for making a decision about boundary presence at a given time stamp of a video. We argue that the introduced CC measure is more powerful in revealing the true quality of a video parsing algorithm than the classical comparison of parsing results with the ground truth. We also discuss how this measure can be employed to characterize and compare video sequences in terms of the expected parsing performance, and to interpret and evaluate the obtained parsing results accordingly

Full Text