Abstract

Lip-based biometric authentication (LBBA) is an authentication method based on a person's lip movements during speech in the form of video data. LBBA can utilize both physical and behavioral characteristics of lip movements without requiring any additional sensory equipment apart from an RGB camera. Current approaches employ deep siamese neural networks trained with one-shot learning to generate embedding vectors from lip movement features. However, most of these approaches don't discriminate against speech content which makes them vulnerable to video replay attacks. Moreover, there is a lack of comprehensive analysis regarding the impact of distinct lip characteristics or difficult dataset phrases with significant word overlap on the performance of authentication in one-shot approaches. To address this, we introduce the GRID-CCP dataset and train a siamese neural network using 3D convolutions and recurrent neural network layers to additionally discriminate against speech content. For loss calculation, we propose a custom triplet loss function for efficient and customizable batch-wise hard-negative mining. Our experimental results, using an open-set protocol, demonstrate a False Acceptance Rate (FAR) of 3.2% and a False Rejection Rate (FRR) of 3.8% on the test set of the GRID-CCP dataset. Finally, we conduct an analysis to assess the influence and discriminative power of behavioral and physical features in LBBA.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.