Abstract

Along with more and more data intensive applications have been migrated into cloud environments, storing some valuable intermediate datasets has been accommodated in order to avoid the high cost of re-computing them. However, this poses a risk on data privacy protection because malicious parties may deduce the private information of the parent dataset or original dataset by analyzing some of those stored intermediate datasets. The traditional way for addressing this issue is to encrypt all of those stored datasets so that they can be hidden. We argue that this is neither efficient nor cost-effective because it is not necessary to encrypt ALL of those datasets and encryption of all large amounts of datasets can be very costly. In this paper, we propose a new approach to identify which stored datasets need to be encrypted and which not. Through intensive analysis of information theory, our approach designs an upper bound on privacy measure. As long as the overall mixed information amount of some stored datasets is no more than that upper bound, those datasets do not need to be encrypted while privacy can still be protected. A tree model is leveraged to analyze privacy disclosure of datasets, and privacy requirements are decomposed and satisfied layer by layer. With a heuristic implementation of this approach, evaluation results demonstrate that the cost for encrypting intermediate datasets decreases significantly compared with the traditional approach while the privacy protection of parent or original dataset is guaranteed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call