Abstract

To recognize people in unconstrained video, one has to explore the identity information in multiple frames and the accompanying dynamic signature. These identity cues include face, body, and motion. Our approach is based on video-dictionaries for face and body. Video-dictionaries are a generalization of sparse representation and dictionaries for still images. We design the video-dictionaries to implicitly encode temporal, pose, and illumination information. In addition, our video-dictionaries are learned for both face and body, which enables the algorithm to encode both identity cues. To increase the ability of our algorithm to learn nonlinearities, we further apply kernel methods for learning the dictionaries. We demonstrate our method on the Multiple Biometric Grand Challenge, Face and Ocular Challenge Series, Honda/UCSD, and UMD data sets that consist of unconstrained video sequences. Our experimental results on these four data sets compare favorably with those published in the literature. We show that fusing face and body identity cues can improve performance over face alone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call