Abstract

With the global prevalence of mobile devices, concerns about mobile devices regarding privacy breaches and data leakage are rising. Although sensor permissions are required for mobile applications to access outputs of built-in sensors, motion sensors (e.g., accelerometer and gyroscope) can be visited directly without permission requirement. Extant studies have shown that motion sensors may cause breaches of confidential information, such as passwords, digits, and voice-based commands, but whether it is possible to synthesize intelligible speech waveforms from low-resolution motion sensors has been understudied. In this paper, we present an escalated side-channel attack of built-in speakers by synthesizing intelligible speech waveforms from low-resolution vibration signals. Opposite to traditional classification problems, we formulate this task as a generative problem and introduce an end-to-end synthesis framework dubbed as <i>AccMyrinx</i> to eavesdrop on the speaker via the low-resolution vibration signals. In <i>AccMyrinx</i>, we introduce the data alignment solution to provide the pair-wise voice-vibration sequences and present wavelet-based MelGAN (WMelGAN) with multi-scale time-frequency domain discriminators to generate intelligible acoustic waveforms. We conducted intensive experiments and demonstrated the feasibility of synthesizing the intelligible acoustic signals from low-resolution solid-borne vibration signals. Compared with existing synthesis solutions, our proposed solution outperforms the baselines in both subject and object metrics with the smoothed word error rate of 42.67&#x0025; and the Mel-Cepstral distortion of 0.298. In addition, the quality of synthetic speeches could be impacted by several factors, including gender, speech rate, volume, and sampling frequency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call