Abstract

In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.

Highlights

  • 6.2 Realistic privacy distribution This use case features a larger dataset with 1103 realistic records; we show that the framework can be utilised to make decisions about data disclosure taking into account user privacy preferences

  • The data controller or system designer may decide that this is an acceptable statistical impact and improve the sending of the data with violations removed, they may approve the sending of the original data or they may opt instead to try an alternative pseudonymization algorithm

  • 8 Conclusion This paper has discussed the pseudonymization aspect of a software framework for measuring privacy risk in a multi actor system. It identifies pseudonymization risk and provides understandable information to help a user without expert knowledge choose a pseudonymization strategy

Read more

Summary

Introduction

The creation of novel, personalized and optimized data-centered applications and services typically requires the collection, analysis and disclosure of increasing amounts of data Such systems will leverage data-sets that include personal data and their usage and disclosure represent a risk to a user’s (data subject) privacy. This data may be disclosed to trusted parties or released into the public domain for social good (e.g. scientific and medical research); it may be used internally within systems to improve the provision of a service (e.g. to better manage resources, or optimize a service for individual users). Systems should be developed to ensure that: i) each individual’s privacy preferences are taken into account, and ii) risks to each and every user’s privacy are minimized

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.