Abstract

Data privacy is one of the highly discussed issues in recent years as we encounter data breaches and privacy scandals often. This raises a lot of concerns about the ways the data is acquired and the potential information leaks. Especially in the field of Artificial Intelligence (AI), the widely using of AI models aggravates the vulnerability of user privacy because a considerable portion of user data that AI models used is represented in natural language. In the past few years, many researchers have proposed NLP-based methods to address these data privacy challenges. To the best of our knowledge, this is the first interdisciplinary review discussing privacy preservation in the context of NLP. In this paper, we present a comprehensive review of previous research conducted to gather techniques and challenges of building and testing privacy-preserving systems in the context of Natural Language Processing (NLP). We group the different works under four categories: 1) Data privacy in the medical domain, 2) Privacy preservation in the technology domain, 3) Analysis of privacy policies, and 4) Privacy leaks detection in the text representation. This review compares the contributions and pitfalls of the various privacy violation detection and prevention works done using NLP techniques to help guide a path ahead.

Highlights

  • D ATA privacy is a highly discussed issue, and we encounter data breaches and privacy scandals in our dayto-day life

  • There are many opportunities where privacy of the data could be violated when used in Artificial Intelligence (AI) models, for example, an adversary could listen to the latent representation of the input in the Machine Learning (ML) models and obtain sensitive information

  • This paper provides an overview of past works where Natural Language Processing (NLP) was used to identify privacy leaks, help build a system for privacy preservation, and identify techniques and challenges of building and testing privacy-preserving systems

Read more

Summary

INTRODUCTION

D ATA privacy is a highly discussed issue, and we encounter data breaches and privacy scandals in our dayto-day life. This is mainly due to the collection of exponentially increasing data and the use of the data on various applications and research. There is a potential risk of exposure to medical records while stored in the databases online or shared between institutions Another field that is highly susceptible to privacy leakage is social media networks, applications, and software. We divide the different applications into four categories: 1) Data privacy in the medical domain, 2) Privacy preservation in the technology domain, 3) Analysis of privacy policies, and 4) Privacy leak detection in the text representation. We conclude with a conclusion that summarizes the review

DATA PRIVACY IN MEDICAL DOMAIN
MACHINE LEARNING-BASED SYSTEMS
PRIVACY LEAKS DETECTION IN TEXT REPRESENTATION
DISCUSSION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.