Abstract

Voice assistants, such as Amazon Alexa, Apple Siri and Tmall Genie, using voice biometrics for the identity authentication, are becoming pervasive in our daily lives. However, voice assistants are vulnerable to reply attack due to the open nature of voice-input channels. An attacker can record the voice commands of victims and replay them to spoof voice assistants. Existing liveness detection approaches are mostly based on machine learning methods, which are expensive and complex. Recently, several approaches are proposed to leverage the human specific voice features or the distinctness of voice played by loudspeaker. However, they require the users and the voice assistant to be in a fixed position and at a very close distance, which is not user-friendly in practice. This paper proposes LiveEar, an efficient and easy-to-use liveness detection system for voice assistant. LiveEar utilizes the differences in phoneme positions between live-human voices and voices replayed through loudspeakers. Specifically, it calculates the time-difference-of-arrival (TDoA) in a sequence of phoneme sounds to the microphone on the voice assistant. Then, an SVM-based classification model is trained with the extracted TDoA features. This paper implements a prototype of LiveEar and evaluates its performance using real-world data. Results show that LiveEar achieves high detection accuracy in various flexible positions, with negligible runtime overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call