Abstract

Head gesture videos recorded of a person bear rich information about the individual. Automatically understanding these videos can empower many useful human-centered applications in areas such as smart health, education, work safety and security. To understand a video’s content, low-level head gesture signals carried in the video that capture characteristics of both human postures and motions need to be translated into high-level semantic labels. To meet this aim, we propose a hierarchical model for learning to understand head gesture videos. Given a head gesture video of an arbitrary length, the model first segments the full-length video into multiple short clips for clip-based feature extraction. Multiple base feature extraction procedures are then independently tuned via a set of peripheral learning tasks without consuming any labels of the goal task. These independently derived base features are subsequently aggregated through a multi-task learning framework, coupled with a feature dimensionality reduction module, to optimally learn to accomplish the end video understanding task in an weakly supervised manner, utilizing the limited amount of video labels available of the goal task. Experimental results show that the hierarchical model is superior to multiple state-of-the-art peer methods in tackling versatile video understanding tasks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.