Abstract

Voice input users’ speech recordings are being collected by service providers and shared with third parties, who may abuse users’ voiceprints, identify them by voice, and learn their sensitive speech content. In this work, we design <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Speech Sanitizer</i> to perturb users’ speech recordings so that the sanitized speech can be safely shared with third parties. First, we desensitize speech content by identifying sensitive words, localizing them in the audio using DTW-based keyword spotting, and substituting them with safe words. Both common and personalized sensitive words are identified and replaced. Then, we anonymize users’ voiceprints with a carefully designed voice conversion mechanism that is resistant to de-anonymization attacks. Meanwhile, we try to preserve the utility of the sanitized speech, measured by the accuracy of speech recognition performed on it. We implement Speech Sanitizer and present extensive experimental results that validate the effectiveness and efficiency of our algorithms. It is demonstrated that we are able to reduce the chance of a user's voice being identified from 50 people by 83.7 percent while keeping the drop of speech recognition accuracy within 19.1 percent. We can also easily relax the privacy level to improve speech recognition accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call