Abstract

Scalable data processing platforms built on cloud computing becomes increasingly attractive as infrastructure for supporting big data applications. But privacy concerns are one of the major obstacles to making use of public cloud platforms. Multidimensional anonymisation, a global-recoding generalisation scheme for privacy-preserving data publishing, has been a recent focus due to its capability of balancing data obfuscation and usability. Existing multidimensional anonymisation methods suffer from scalability problems when handling big data due to the impractical serial I/O cost. Given the recursive feature of multidimensional anonymisation, parallelisation is an ideal solution to scalability issues. However, it is still a challenge to use existing distributed and parallel paradigms directly for recursive computation. In this paper, we propose a scalable approach for big data multidimensional anonymisation based on MapReduce, a state-of-the-art data processing paradigm. Our basic idea is to partition a data set recursively into smaller partitions using MapReduce until all partitions can fit in the memory of a computing node. A tree indexing structure is proposed to achieve recursive computation. Moreover, we show the applicability of our approach to differential privacy. Experimental results on real-life data demonstrate that our approach can significantly improve the scalability of multidimensional anonymisation over existing methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.