Abstract

Data with unbalanced multivariate nominal attributes collected from a large number of users provide a wealth of knowledge for our society. However, it also poses an unprecedented privacy threat to participants. Local differential privacy, a variant of differential privacy, is proposed to eliminate the privacy concern by aggregating only randomized values from each user, with the provision of plausible deniability. However, traditional local differential privacy algorithms usually assign the same privacy budget to attributes with different dimensions, leading to large data utility loss and high communication costs. To obtain highly accurate results while satisfying local differential privacy, the aggregator needs a reasonable privacy budget allocation scheme. In this paper, the Lagrange multiplier (LM) algorithm was used to transform the privacy budget allocation problem into a problem of calculating the minimum value from unconditionally constrained convex functions. The solution to the nonlinear equation obtained by the Cardano formula (CF) and Newton-Raphson (NS) methods was used as the optimal privacy budget allocation scheme. Then, we improved two popular local differential privacy mechanisms by taking advantage of the proposed privacy budget allocation techniques. Extension simulations on two different data sets with multivariate nominal attributes demonstrated that the scheme proposed in this paper can significantly reduce the estimation error under the premise of satisfying local differential privacy.

Highlights

  • At present, with the development of various integrated sensors and crowdsensing systems, crowdsourced information from all aspects can be collected and analyzed among various data attributes to better produce rich knowledge about a group, benefiting everyone in the crowdsourcing system [1]

  • To meet the local privacy guarantee and the different needs of different data for the privacy concern levels, we use the optimal privacy budget allocation scheme obtained by the above procedure to improve the binary randomized response (BRR) and multivariate randomized response (MRR) and call it the OBRR and OMRR, respectively, which are optimal in the high and low privacy regimes with high-dimensional heterogeneous data, respectively

  • Related work This paper focuses on the frequency statistics problem of high-dimensional heterogeneous data with local differential privacy, which refers to the situation where each user sends multiple variable values voted from candidate attributes

Read more

Summary

Introduction

With the development of various integrated sensors and crowdsensing systems, crowdsourced information from all aspects can be collected and analyzed among various data attributes to better produce rich knowledge about a group, benefiting everyone in the crowdsourcing system [1]. With high-dimensional heterogeneous data (data with unbalanced multivariate nominal attributes), there are many hidden rules and much hidden information behind the data that can be mined to provide better services for individuals or groups. In the process of providing cloud services, user gender, age, and habits when using operating systems and browsers should be deeply explored to provide different special services to different user groups. (2020) 10:25 and genetic information [2] can be followed closely to better diagnose and monitor patient health status Sci. (2020) 10:25 and genetic information [2] can be followed closely to better diagnose and monitor patient health status

Objectives
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call