Abstract

Progress in the automatic detection and identification of humans in video, given a minimal number of labelled faces as training data, is described. This is an extremely challenging problem owing to the many sources of variation in a person's imaged appearance: pose variation, scale, facial expression, illumination, partial occlusion, motion blur, etc. The method developed in this work combines approaches from computer vision, for detection and pose estimation, with those from machine learning for classification. A ‘generative’ model of a person's head is defined consisting of a coarse 3-D model and multiple texture maps. This allows faces to be rendered with a variety of facial expressions and at poses differing from those of the training data. It is shown that the identity of a target face can then be determined by first proposing faces with similar pose, and then classifying the target face as one of the proposed faces or not. Furthermore, the texture maps of the model can be automatically updated as new poses and expressions are detected. Results of detecting three characters in a TV situation comedy are demonstrated.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call