Face biometrics have achieved remarkable performance over the past decades, but unexpected spoofing of the static faces poses a threat to information security. There is an increasing demand for stable and discriminative biological modalities which are hard to be mimicked and deceived. Speech-driven 3D facial motion is a distinctive and measurable behavior-signature that is promising for biometrics. In this paper, we propose a novel 3D behaviometrics framework based on a “3D visual passcode” derived from speech-driven 3D facial dynamics. The 3D facial dynamics are jointly represented by 3D-keypoint-based measurements and 3D shape patch features, extracted from both static and speech-driven dynamic regions. An ensemble of subject-specific classifiers are then trained over selected discriminative features, which allows for a discriminant speech-driven 3D facial dynamics representation. We construct the first publicly available Speech-driven 3D Facial Motion dataset (S3DFM) that includes 2D-3D face video plus audio samples from 77 participants. The experimental results on the S3DFM show that the proposed pipeline achieves a face identification rate of 96.1%. Detailed discussions are presented, concerning anti-spoofing, head pose variation, video frame rate, and applicability cases. We also give comparison with other baselines on “deep” and “shallow” 2D face features.
Read full abstract