Abstract

We define and implement a novel side-channel attack that exploits a smartphone’s accelerometer to eavesdrop entire words that the device itself is reproducing through its loudspeakers. The proposed approach consists of two modules: (i) a deep learning-based system that, using a Convolutional Neural Network (CNN), learns to recognize a set of significant speech units, using the spectrogram representation of the corresponding acceleration signals; (ii) an evolutionary-based segmentation method that, given the accelerometer measurements corresponding to an input speech, finds the best way to split it so that the proposed CNN maintains a high classification performance on each of the segments obtained, guarantying the recognition of a significant percentage of words from the original speech.Results of experiments performed to assess the effectiveness of the proposed attack, show its ability to recognize a percentage of words which is higher for short speeches and diminishes as the speeches get longer. We experimented with speeches of lengths ranging from 5 to 60 s, obtaining a recognition percentage going from about 80% for the shortest speeches, down to about 54% for the longest ones.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call