An Escalated Eavesdropping Attack on Mobile Devices via Low-Resolution Vibration Signals

Yunji Liang,Sagar Samtani,Luwen Huangfu,Xiaokai Yan,Qi Li,Yuchen Qin,Zhiwen Yu,Bin Guo

doi:10.1109/tdsc.2022.3198934

Yunji Liang, Sagar Samtani + Show 6 more

https://doi.org/10.1109/tdsc.2022.3198934

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

With the global prevalence of mobile devices, concerns about mobile devices regarding privacy breaches and data leakage are rising. Although sensor permissions are required for mobile applications to access outputs of built-in sensors, motion sensors (e.g., accelerometer and gyroscope) can be visited directly without permission requirement. Extant studies have shown that motion sensors may cause breaches of confidential information, such as passwords, digits, and voice-based commands, but whether it is possible to synthesize intelligible speech waveforms from low-resolution motion sensors has been understudied. In this paper, we present an escalated side-channel attack of built-in speakers by synthesizing intelligible speech waveforms from low-resolution vibration signals. Opposite to traditional classification problems, we formulate this task as a generative problem and introduce an end-to-end synthesis framework dubbed as <i>AccMyrinx</i> to eavesdrop on the speaker via the low-resolution vibration signals. In <i>AccMyrinx</i>, we introduce the data alignment solution to provide the pair-wise voice-vibration sequences and present wavelet-based MelGAN (WMelGAN) with multi-scale time-frequency domain discriminators to generate intelligible acoustic waveforms. We conducted intensive experiments and demonstrated the feasibility of synthesizing the intelligible acoustic signals from low-resolution solid-borne vibration signals. Compared with existing synthesis solutions, our proposed solution outperforms the baselines in both subject and object metrics with the smoothed word error rate of 42.67% and the Mel-Cepstral distortion of 0.298. In addition, the quality of synthetic speeches could be impacted by several factors, including gender, speech rate, volume, and sampling frequency.

Full Text