Abstract

Personal videos often contain visual distractors, which are objects that are accidentally captured and can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the temporal-superpixel level using a traditional supporting vector machine based learning framework. We also experiment with end-to-end learning using convolutional neural networks, which achieves slightly higher performance than other methods. The classification result is further refined in a postprocessing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling, video frame replacement, and camera path replanning. The user study results show that our method can significantly improve the aesthetic quality of videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.