Abstract

We present a key frame extraction method dedicated to summarize <i>unstructured</i> consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides meaningful information about the scene and the cameraman's <i>general</i> intents. First, camera and object motion are estimated and used to derive motion descriptors. A video is segmented into homogeneous segments based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. Confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that we can produce any desired number of key frames from the candidates. We demonstrated the effectiveness of our method by comparing results with the <i>ground truth</i> agreed by multiple judges.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call