Abstract

The pervasiveness of Internet of Things results in vast volumes of personal data generated by smart devices of users (data producers) such as smart phones, wearables and other embedded sensors. It is a common requirement, especially for Big Data analytics systems, to transfer these large in scale and distributed data to centralized computational systems for analysis. Nevertheless, third parties that run and manage these systems (data consumers) do not always guarantee users’ privacy. Their primary interest is to improve utility that is usually a metric related to the performance, costs and the quality of service. There are several techniques that mask user-generated data to ensure privacy, e.g. differential privacy. Setting up a process for masking data, referred to in this paper as a ‘privacy setting’, decreases on the one hand the utility of data analytics, while, on the other hand, increases privacy. This paper studies parameterizations of privacy settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the estimations of aggregation functions. Privacy settings can be universally applied as system-wide parameterizations and policies (homogeneous data sharing). Nonetheless they can also be applied autonomously by each user or decided under the influence of (monetary) incentives (heterogeneous data sharing). This latter diversity in data sharing by informational self-determination plays a key role on the privacy-utility trajectories as shown in this paper both theoretically and empirically. A generic and novel computational framework is introduced for measuring privacy-utility trade-offs and their Pareto optimization. The framework computes a broad spectrum of such trade-offs that form privacy-utility trajectories under homogeneous and heterogeneous data sharing. The practical use of the framework is experimentally evaluated using real-world data from a Smart Grid pilot project in which energy consumers protect their privacy by regulating the quality of the shared power demand data, while utility companies make accurate estimations of the aggregate load in the network to manage the power grid. Over 20,000 differential privacy settings are applied to shape the computational trajectories that in turn provide a vast potential for data consumers and producers to participate in viable participatory data sharing systems.

Highlights

  • High data volumes are generated in real-time from users’ smart devices such as smartphones, wearables and embedded sensors

  • This paper studies parameterizations of privacy settings that regulate the trade-off between maximum utility, minimum privacy and minimum utility, maximum privacy, where utility refers to the accuracy in the estimations of aggregation functions

  • This paper studies the optimization of computational tradeoffs between privacy and utility that can be used to model information sharing as supply–demand systems run by computational markets [7,16]

Read more

Summary

Introduction

High data volumes are generated in real-time from users’ smart devices such as smartphones, wearables and embedded sensors. This paper studies the optimization of computational tradeoffs between privacy and utility that can be used to model information sharing as supply–demand systems run by computational markets [7,16] These trade-offs can be measured by the opportunity cost between privacy-preservation and the performance of algorithms operating on masked data, i.e. prediction accuracy. In contrast to related work that exclusively focuses on universal optimal privacy settings (homogeneous data sharing), this paper studies the optimization of privacy-utility trade-offs under diversity in data sharing (heterogeneous data sharing) This is a challenging but more realistic scenario for participatory data sharing systems that allow informational selfdetermination via a freedom and autonomy in the amount/quality of data shared by each data producer.

Related work
Privacy-preserving mechanisms
Privacy and computational markets
Comparison and positioning
Problem definition
Framework
Experimental settings
Privacy evaluation
Utility evaluation
Electricity Customer Behavior Trial Dataset
Privacy mechanisms
Error analysis
Parameter analysis
Homogeneous system evaluation
Heterogeneous system evaluation
Findings
Conclusion and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.