This research delves into the concept of color grading in film, focusing on how color influences the emotional response of the audience. The study commenced by recalling state-of-the-art works that process audio–video signals and associated emotions by machine learning. Then, assumptions of subjective tests for refining and validating an emotion model for assigning specific emotional labels to selected film excerpts were presented. The insights gained from these subjective evaluations facilitated the creation of a comprehensive database of movie excerpts. This database was subsequently employed to both train and evaluate the efficacy of deep learning models. The latter half of the study shifts focus to the intelligent analysis of audio and video signals that form film excerpts. This involved exploring diverse methodologies for parameterizing these signals. Models that demonstrated the highest accuracy on the test dataset on audio/video only were amalgamated to forge a bimodal model, which integrates both audio and video signals for emotion classification. The bimodal model exhibited superior accuracy in tests compared to a model that solely relied on video signal classification. This enhancement in performance was achieved with only a marginal increase in the complexity and the number of parameters within the model.
Read full abstract