Abstract
This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.