Abstract
While watching movies, audience members exhibit both subtle and coarse gestures (e.g., smiles, head-pose change, fidgeting, stretching) which convey sentiment (i.e., engaged or disengaged) during feature length movies. Noticing these behaviors using computer vision systems is a very challenging problem—especially in a movie theatre environment. The environment is dark and contains views of people at different scales and viewpoints. Feature length movies typically run 80-120 minutes, and tracking people uninterrupted for this duration is still an unsolved problem. Facial expressions of audience members are subtle, short, and sparse; making it difficult to detect and recognize activities. Finally, annotating audience sentiment at the frame-level is prohibitively time consuming. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input of audiences watching feature length movies. We present a method which can automatically detect the change in behavior (key-gestures) using “key-frames”, which can convey audience sentiment. As the number of key-frames are many orders of magnitudes lower than the number of frames, the annotation problem is reduced to assigning a sentiment label for each key-frame. Using these discovered key-gestures, we create a movie rating classifier from crowd-sourced ratings and demonstrate its predictive capability. Our dataset consists of over 50 hours of audience behavior collected across 237 subjects.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.