Abstract

Discovering that Map-Reduce framework is a popular way to deal with a large scale of data, but there is a significant risk to leak out users' personal information, especially when the data is sensitive, for example, including users' health records, salary information, etc. Differential privacy has recently emerged as a new paradigm for preserving private data. This makes it possible to provide strong theoretical guarantees on the privacy and utility of the query results. In this paper, we focus on top-k query which is one of the most useful queries in Map-Reduce framework over big data sets.Motivated by this, we propose an efficient algorithm, called DiffMRDifferentially private Top-kquery over MapReduce), for processing top-k query as well as satisfying differential privacy. In our algorithm, to avoid the private leak in middle process, we use exponential mechanism to select top-k records from big data sets by using score function. When the data set is too large to get a reasonably accurate result, we can reduce the reject rate and execute several more times Map-Reduce to get a more accurate top-k query result. After getting a final top-k candidate result, we will add Laplace noise to each record and adopt post-processing technique to improve the accuracy of query answers. Our experimental study demonstrates that DiffMR algorithm can be used to answer the top-k query accurately in Map-Reduce framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.