Abstract

Voice assistants support contactless smart device control and thus act as a holy grail of human-computer interaction. However, recent studies reveal that an adversary can manipulate devices by vicious voice commands. This security risk is caused by only executing one-time liveness detection and lacking safeguard modules after service activation. Therefore, identifying speaker type (i.e., human articulators or loudspeakers) is critical in protecting voice-driven services during an entire interaction session. In this paper, we propose a continuous voice liveness detection approach LiveProbe, leveraging unique energy response patterns in frequency bands induced by distinct voice generation mechanisms. The rationality behind LiveProbe is presented in two aspects: human articulator reshapes initial voices by exquisitely coordinated movements of vocal organs, which act as band-pass filters generating unique energy responses; nevertheless, the internal modules of loudspeakers are position-fixed and cannot reproduce this response characteristic. To that end, we first work on voice generation mechanisms behind two-type speakers that cause spectrum differences. Then we elaborately construct signal processing and deep-learning modules to extract liveness features. Especially, our approach doesn’t interfere with normal voice interaction and needn’t to carry customized sensors. The experiment presents its effectiveness against potential attacks with a false acceptance rate of 0.51%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call