Abstract
Recent research has demonstrated that lip-based speaker authentication systems can not only achieve good authentication performance but also guarantee liveness. However, with modern DeepFake technology, attackers can produce the talking video of a user without leaving any visually noticeable fake traces. This can seriously compromise traditional face-based or lip-based authentication systems. To defend against sophisticated DeepFake attacks, a new visual speaker authentication scheme based on the deep convolutional neural network (DCNN) is proposed in this paper. The proposed network is composed of two functional parts, namely, the Fundamental Feature Extraction network (FFE-Net) and the Representative lip feature extraction and Classification network (RC-Net). The FFE-Net provides the fundamental information for speaker authentication. As the static lip shape and lip appearance is vulnerable to DeepFake attacks, the dynamic lip movement is emphasized in the FFE-Net. The RC-Net extracts high-level lip features that discriminate against human imposters while capturing the client's talking style. A multi-task learning scheme is designed, and the proposed network is trained end-to-end. Experiments on the GRID and MOBIO datasets have demonstrated that the proposed approach is able to achieve an accurate authentication result against human imposters and is much more robust against DeepFake attacks compared to three state-of-the-art visual speaker authentication algorithms. It is also worth noting that the proposed approach does not require any prior knowledge of the DeepFake spoofing method and thus can be applied to defend against different kinds of DeepFake attacks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Information Forensics and Security
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.