Abstract

Local differential privacy (LDP) is a prevalent measure of privacy protection as it provides rigorous privacy guarantees and has been widely studied for statistical analysis, especially in frequency estimation. As a representative LDP-enabled frequency estimation algorithm, Google's <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Randomized Aggregation Privacy-Preserving Ordinal Response</i> (RAPPOR) has been put into practice. However, it achieves sub-optimal utility due to the following limitations. Firstly, the adoption of the MD5 hash function inevitably results in the hash collision. Secondly, the application of the randomized response technique leads to randomness. To improve the practical effectiveness of RAPPOR and the utility of frequency-based services, we propose an LDP-enabled frequency estimation method called PK-RAPPOR, in which we devise an effective re-encoding hash function (RE-HF) incorporating prior knowledge (PK) about the rough frequency ranking of items. RE-HF divides items into several cohorts based on the PK and generates a unique hash value set for each item. Compared with the original RAPPOR, the hash collision can be eliminated for items from different cohorts, and the effect of randomness can be decreased by the overlapping of items from the same cohorts. We validate our proposed method with theoretical analysis and demonstrate its effectiveness with experiments on both synthetic and real-world datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call