Abstract

Without developing dedicated countermeasures, facial biometric systems can be spoofed with printed photos, replay attacks, silicone masks, or even a 3D mask of a targeted person. Thus, the threat of presentation attacks needs to be addressed to strengthen the security of the biometric systems. Since a 2D convolutional neural network (CNN) captures static features from video frames, the camera motion might hinders the performance of modern CNNs for video-based presentation attack detection (PAD). Inspired by the egomotion theory, we introduce an adaptive spatiotemporal global sampling (ASGS) technique to compensate the camera motion and use the resulting estimation to encode the appearance and dynamics of the video sequences into a single RGB image. This is achieved by adaptively splitting the video into small segments and capturing their global motion within each segment. The proposed global motion is estimated based on four key steps: dense sampling, FREAK feature extraction and matching, similarity transformation, and aggregation function. This allows using deep models pre-trained on images for video-based PAD detection. Moreover, the interpretation of ASGS reveals that the most important parts for supporting the decision on PAD are consistent with motion cues associated with the artifacts, i.e., hand movement, material reflection, and expression changes. Extensive experiments on four standard face PAD databases demonstrate its effectiveness and encourage further study in this domain.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call