Abstract

This paper unravels the problem of temporal video segmentation, or video parsing, and explores the possibilities for defining theoretical limits for the expected performance of a general parsing algorithm. In particular, we address the challenge of computing the coherence of video content, which is critical to the ability of an algorithm to parse a video automatically. If this coherence is difficult to extract from video data, it is unrealistic to expect that any parsing algorithm applied to that data will perform optimally with respect to the ground truth, independent of the features and approach used. The measure of coherence computability (CC) we introduce in this paper is derived from the average uncertainty in extracting the content-related information from data, which translates into the uncertainty for making a decision about boundary presence at a given time stamp of a video. We argue that the introduced CC measure is more powerful in revealing the true quality of a video parsing algorithm than the classical comparison of parsing results with the ground truth. We also discuss how this measure can be employed to characterize and compare video sequences in terms of the expected parsing performance, and to interpret and evaluate the obtained parsing results accordingly

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.