Privacy-preserving Federated Learning and its application to natural language processing

Gábor Lóki,Ákos Kiss,Balázs Egedi,Karthikeyan Saravanan,Haaris Mehmood,István Hegedűs,Balázs Nagy,Noémi Sándor

doi:10.1016/j.knosys.2023.110475

Gábor Lóki, Ákos Kiss

Open Access

https://doi.org/10.1016/j.knosys.2023.110475

Copy DOI

Abstract

State-of-the-art edge devices are capable of not only inferring machine learning (ML) models but also training them on the device with local data. When this local data is sensitive, privacy becomes a crucial property that must be addressed. This implies that sharing data with a server for training a model is undesirable and should be avoided. The Federated Learning (FL) approach can help in these situations, however, FL alone is still not the ultimate tool to solve all challenges, especially when privacy is a major concern. We propose a privacy-preserving FL framework, which leverages the concepts of bitwise quantization, local differential privacy (LDP), and feature hashing for input representation in the collaborative training of ML models. In our approach, the local model updates are first quantized, then a randomized-response technique is applied on the resulting update vector.Although our proposed framework functions with arbitrary types of input features, we emphasize its usability with natural language data. The text input on the client-side is encoded using a rolling-hash-based representation, which provides a combined solution for the high resource demands of embedding algorithms and the privacy concerns of sharing sensitive data. We evaluate our method in a sentiment analysis task using the IMDB Movie Reviews dataset as well as a rating prediction task with the MovieLens dataset augmented with additional movie keywords. We demonstrate that our approach is a feasible solution for private language processing tasks on edge devices without the use of resource-hungry language models or privacy-violating collection of client data.

Full Text