Abstract

We present a novel method to summarize unconstrained videos using salient montages (i.e., a "melange" of frames in the video as shown in Fig.1, by finding "montageable moments" and identifying the salient people and actions to depict in each montage. Our method aims at addressing the increasing need for generating concise visualizations from the large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken "in the wild" where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. In our experiments, we show that our method can reliably detect and track humans under significant action and camera motion. Moreover, the predicted salient people are more accurate than results from state-of-the-art video salieny method [1] . Finally, we demonstrate that a novel "montageability" score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.We present a novel method to summarize unconstrained videos using salient montages (i.e., a "melange" of frames in the video as shown in Fig.1, by finding "montageable moments" and identifying the salient people and actions to depict in each montage. Our method aims at addressing the increasing need for generating concise visualizations from the large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken "in the wild" where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. In our experiments, we show that our method can reliably detect and track humans under significant action and camera motion. Moreover, the predicted salient people are more accurate than results from state-of-the-art video salieny method [1] . Finally, we demonstrate that a novel "montageability" score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call