Understanding human behavior using computer vision techniques for recognizing body posture, gait, hand gesture, and facial expressions has recently witnessed significant research activity. Emotions/affect have a direct correlation with the mental state, as well as intention of a person, based on which his/her present and future states can be understood and predicted. As a case study in this work, we demonstrate the utility of deep learning in understanding videos of Indian classical dance (ICD) forms. ICD comprises hand gestures, body poses and facial expressions enacted by the performer along with the accompanying music and songs/Shlokas. In this work we attempt to decipher the meaning of Navarasas associated with Indian classical dance (ICD). Recognizing these emotions from images/videos of ICD is a challenge due to factors such as ambiguity in the enactment, costume, make-up, clutter, etc. Here, we propose a dataset of various emotions (Navarasas) enacted in ICD comprising RGB images along with associated depth information collected using the Microsoft Kinect sensor. We propose a deep learning framework using convolutional neural networks to understand the semantic meaning associated with videos of ICD by recognizing Navarasas enacted by the performer.
Read full abstract