To detect 3D mask face presentation attack, remote Photoplethysmography (rPPG), a biomedical technique that measures the heartbeat signal remotely with a normal RGB camera, is adopted as a robust liveness cue. Although existing rPPG-based solutions exhibit strong performance in experiments, the required observation time is too long (10-12 seconds) to be user-friendly in real applications such as E-payment and smartphone unlock. To shorten the observation time (within 1-second), we propose a fast rPPG-based 3D mask presentation attack detection (PAD) method by analyzing the similarity of rPPG signals in the time domain. In particular, based on facial and background local rPPG signals, we design a set of temporal similarity features to investigate the robust properties of rPPG shape and phase. Following the same direction, we refine the traditional rPPG extractor into a learnable network to cooperate with our TSrPPG feature for better robustness. An effective but lightweight spatiotemporal convolution network is constructed with a self-supervised learning strategy, aiming at enhancing the consistency of genuine facial rPPG signals and reducing the correlation of rPPG signals on masked faces. Extensive experiments are conducted on 3DMAD, HKBU-MARs V1+ and V2+, and CSMAD, which totally involve 18772 short-term video slots with a large number of real-world variations, in terms of mask type, mask transmittance, lighting condition, recording device, resolution of facial region, and compression configuration. Our proposed method persists the good performance of rPPG-based solution with only 1-second observation and outperforms the state-of-the-art competitors on discriminability and generalizability. Evaluations on prints attack, display attack, and disguise attacks with transparent masks, make-up and tattoo further exhibit its potential on handling a wider variety of attacks. To our best knowledge, this is the first work that addresses the length of observation time issue of rPPG-based 3D mask PAD.
Read full abstract