Abstract

With the increasing popularity of crowdsourcing services, high-dimensional crowdsourced data provides a wealth of knowledge. Nonetheless, unprecedented privacy threats to participants have emerged, due to complex correlations among multiple attributes and the vulnerabilities of untrusted crowdsourcing servers. Differential privacy-based paradigms have been proposed to release privacy-preserving datasets with statistical approximation. Nonetheless, most existing schemes are limited when facing highly correlated attributes, and cannot prevent privacy threats from untrusted crowdsourcing servers. To address this issue, we propose two novel solutions, namely <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LoCop</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DR_LoCop</i> , which guarantee local differential privacy based on the randomized response technique while synthesizing and releasing high-dimensional crowdsourced data with high data utility. Particularly, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LoCop</i> leverages copula theory to synthesize high-dimensional crowdsourced data via univariate marginal distribution and attribute dependence. Univariate marginal distribution is estimated by the Lasso-based regression algorithm from aggregated privacy-preserving bit strings. Dependencies among attributes are modeled as multivariate Gaussian copula. Based on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">LoCop</i> , the enhanced solution <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DR_LoCop</i> not only takes advantage of C-vine copula to reflect conditional dependencies among high-dimensional attributes, but also achieves dimension reduction. Extensive experiments on real-world datasets demonstrate that our solutions substantially outperform the state-of-the-art techniques in terms of both data utility and computational overhead.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.