Skin Region Extraction and Person-Independent Deformable Face Templates for Fast Video Indexing

Simon Clippingdale,Mahito Fujii

doi:10.1109/ism.2011.75

Abstract

We describe a face tracking and recognition system for video and multimedia indexing that handles face regions at variable face poses (left-right and up-down), and deformations due to facial expressions and speech, by employing person-independent deformable templates at multiple poses on the view-sphere. An earlier version of the system handled variable poses (left-right only) by employing person-specific templates registered for each target individual at multiple poses. The new system speeds up processing by (i) extracting and restricting attention to skin-color regions, (ii) performing recognition using person-specific templates at near-frontal poses only, and (iii) tracking at non-frontal poses using the person-independent templates. Registration is also simplified, since multiple views of each target individual are no longer required, at the cost of a loss of recognition functionality at poses far from frontal (the system instead remembers the identity of each individual from near-frontal matches and tracks between them). We describe the skin region extraction process and the process by which the person-independent templates are constructed off-line from bootstrap face images of multiple non-target individuals, and we present experimental results showing the system in operation. Finally we discuss remaining issues in the practical application of the system to video and multimedia archive indexing.

Full Text