Abstract

In this paper we investigate and tackle the privacy risks of deep-neural-network-based feature extraction for sound classification in acoustic sensor networks. To this end, we analyze a single-label domestic activity monitoring and a multi-label urban sound tagging scenario. We show that in both cases, the feature representations designed for sound classification also carry a significant amount of speaker-dependent data, thus posing serious privacy risks for speaker recognition attacks based on feature interception. We then propose to mitigate the aforementioned privacy risks by introducing a variational information feature extraction scheme that allows sound classification while, concurrently, minimizing the feature representation's level of information and hence, inhibiting speaker recognition attempts. We control and analyze the balance between the performance of the trusted and attacker tasks via the resulting model's composite loss function, its budget scaling factor, and latent space size. It is empirically demonstrated that the proposed privacy-preserving feature representation generalizes well to both single-label and multi-label scenarios with vast as well as reduced training-dataset resources. Furthermore, it exhibits robustness against x-vector-based, state-of-the-art speaker recognition attacks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call