Abstract

While watching movies, audience members exhibit both subtle and coarse gestures (e.g., smiles, head-pose change, fidgeting, stretching) which convey sentiment (i.e., engaged or disengaged) during feature length movies. Noticing these behaviors using computer vision systems is a very challenging problem—especially in a movie theatre environment. The environment is dark and contains views of people at different scales and viewpoints. Feature length movies typically run 80-120 minutes, and tracking people uninterrupted for this duration is still an unsolved problem. Facial expressions of audience members are subtle, short, and sparse; making it difficult to detect and recognize activities. Finally, annotating audience sentiment at the frame-level is prohibitively time consuming. To circumvent these issues, we use an infrared illuminated test-bed to obtain a visually uniform input of audiences watching feature length movies. We present a method which can automatically detect the change in behavior (key-gestures) using “key-frames”, which can convey audience sentiment. As the number of key-frames are many orders of magnitudes lower than the number of frames, the annotation problem is reduced to assigning a sentiment label for each key-frame. Using these discovered key-gestures, we create a movie rating classifier from crowd-sourced ratings and demonstrate its predictive capability. Our dataset consists of over 50 hours of audience behavior collected across 237 subjects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call