Domain-Specific Adaptation of CNN for Detecting Face Presentation Attacks in NIR

Ketan Kotwal,Xu Wenkang,Sushil Bhattacharjee,Zohreh Mostaani,Sebastien Marcel,Zhao Yaxi,Huang Wei,Philip Abbet

doi:10.1109/tbiom.2022.3143569

Abstract

For the automotive industry moving towards personalized applications and experiences, the identification of the person inside vehicle is necessary; and it must be carried out in a secure manner. In this paper, we propose a unique face presentation attack detection (PAD) system for operation inside a passenger vehicle. A typical <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in-vehicular</i> face PAD system is required to function with several constraints such as bounded sensing (imaging) capabilities, limited computing resources on embedded devices, real-time inference, and essentially, very high accuracy. In this work, we develop a face PAD system for automotive domain, relying on a single NIR camera, to continually verify whether the driver’s face is <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">bona-fide</i> or not. Our work has two main contributions: first, a lightweight face PAD framework has been developed using a 9-layer convolutional neural network (CNN). With its compact size and limited set of operators, it can be deployed in a resource constrained embedded device to achieve a near real-time inference. To alleviate the problem of limited training data (face PAD in NIR) for a given system, we develop an efficient mechanism to obtain this CNN through the combination of adaptation of domain-specific layers and task-specific fine-tuning of a base CNN. As the second contribution, we collect a large face PAD dataset with 5800+ videos, acquired in NIR (940 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">nm</i> ) illumination, for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in-vehicular</i> use-cases. This dataset, named VFPAD, captures several real-world variations in terms of environmental settings, illumination, subject’s pose, and appearances. Based on the VFPAD dataset, we demonstrate that the proposed face PAD method achieves very high performance (overall accuracy <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\approx 98.0$ </tex-math></inline-formula> %), and also outperforms several baseline face PAD methods. The dataset will be shared with the wider scientific community for research purposes.

Full Text