Automatic Video Editing

Sergey Y Podlesnyy

doi:10.1007/978-3-030-66741-2_6

Abstract

Automatic video editing is an artistic process involving at least the steps of selecting the most valuable footage from the points of view of visual quality and the importance of the action filmed; and cutting the footage into a brief and coherent visual story that would be interesting to watch is implemented in a purely data-driven manner. We describe a system that is capable of learning the editing style from samples extracted from the content created by professional editors, including motion picture masterpieces, and of applying this data-driven style to cut non-professional videos with the ability to mimic the individual style of selected reference samples. Visual semantic and aesthetic features are extracted by an ImageNet-trained convolutional neural network, and the editing controller can be trained by an imitation learning algorithm or reinforcement learning algorithm. As a result, during the test the controller shows signs of observing basic cinematography editing rules learned from the corpus of motion pictures masterpieces. The loss function developed for learning approaches can be efficiently applied in a global optimisation setting of the automatic video editing problem using dynamic programming.

Full Text