Convolutional Neural Network-Enhanced Video Artistry: Leveraging VGG16 for Dynamic Style Transfer

Tharun S Kumar,Nadeem N,Siddharth Rajesh Menon,T. Anjali

doi:10.1016/j.procs.2024.03.226

Abstract

This paper introduces a novel application of neural style transfer to video sequences, leveraging pre-trained Convolutional Neural Network (CNN) models. By combining content and style losses in an optimization process, the approach transforms video frames into visually captivating compositions, opening new avenues for artistic expression in visual storytelling and filmmaking. The procedure begins with meticulous preprocessing of video frames to align them with the specifications of the powerful VGG16 model. A content model is then constructed to extract activations from the 14th layer of the VGG network, forming the groundwork for nuanced content depiction. Style information is computed by aggregating data from a combination of convolutional layers, resulting in a multifaceted, artistically complex tapestry. The paper introduces a well-balanced loss function that combines style and content losses, serving as the foundation for the ensuing optimization procedure. This careful balancing between style adherence and content authenticity yields visually stunning synthesis, enhancing video sequence quality and artistic resonance. The paper also delves into a detailed analysis of hyper parameters, particularly style weights, and introduces a novel pixel value scaling method for enhanced visual coherence. Experimental findings demonstrate the efficiency and adaptability of the proposed approach, showcasing the seamless integration of content and style for an immersive visual experience. This accomplishment marks a significant stride toward unlocking the creative potential of video content, ushering in a period of enhanced visual storytelling.

Full Text