The widespread use of pre-trained language models has revolutionised knowledge transfer in natural language processing tasks. However, there is a concern regarding potential breaches of user trust due to the risk of re-identification attacks, where malicious users could extract Personally Identifiable Information (PII) from other datasets. To assess the extent of extractable personal information on popular pre-trained models, we conduct the first wide coverage evaluation and comparison of state-of-the-art privacy-preserving algorithms on a large multi-lingual dataset for sentiment analysis annotated with demographic information (including location, age, and gender). Our results suggest a link between model complexity, pre-training data volume, and the efficacy of privacy-preserving embeddings. We found that privacy-preserving methods demonstrate greater effectiveness when applied to larger and more complex models, with improvements exceeding >20% over non-private baselines. Additionally, we observe that local differential privacy imposes serious performance penalties of ≈20% in our test setting, which can be mitigated using hybrid or metric-DP techniques.
Read full abstract