Key frame extraction from unstructured consumer video clips

Christophe Papin,Jiebo Luo

doi:10.1117/12.704373

Abstract

We present a key frame extraction method dedicated to summarize unstructured consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides meaningful information about the scene and the cameraman's general intents. First, camera and object motion are estimated and used to derive motion descriptors. A video is segmented into homogeneous segments based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. Confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that we can produce any desired number of key frames from the candidates. We demonstrated the effectiveness of our method by comparing results with the ground truth agreed by multiple judges.

Full Text