Abstract
Deep learning techniques have been widely used in natural language processing (NLP) tasks and have made remarkable progress. However, training the deep learning model relies on a large amount of data which may involve sensitive information like electronic medical records. The attacker can infer sensitive information from the model, which leads to privacy leakage. To solve this problem, we propose a Differentially Private Recurrent Variational AutoEncoder (DP-RVAE) that can generate simulated data in place of the sensitive dataset to preserve privacy. To generate high utility synthetic text, a part of sensitive text data is employed as the conditional input of the model and uses a dropout and noise perturbing mechanism to preserve differential privacy. In addition, we expand the proposed DP-RVAE to a federated learning setting and design a novel training paradigm for NLP tasks. Specifically, DP-RVAE is deployed to the client-side to train and generate personalized text. These DP-RVAE models would be aggregated and updated through the Federated Optimisation (FedOPT) algorithm so that personal information can be well preserved. We evaluate our proposed DP-RVAE through a text classification task on the Tweets depression sentiment and IMDB reviews datasets. Our DP-RVAE achieves a higher average test accuracy by 5.90% and 3.94% compared to the typical centralized training and federated learning approach, respectively. We also perform the keywords inference attack experiment on the medical description dataset collected from the real world. Compared to the typical differentially private preserving approach, the DP-RVAE decreases by 15.2% in average attack accuracy. The experimental results demonstrate that DP-RVAE can be applied to the NLP models to leverage accuracy while preserving sensitive privacy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.