PrivSuper: A Superset-First Approach to Frequent Itemset Mining under Differential Privacy

Ning Wang,Zhenjie Zhang,Ge Yu,Yu Gu,Yin Yang,Xiaokui Xiao

doi:10.1109/icde.2017.131

Abstract

Differential privacy, which has been applied in Google Chrome and Apple iOS, provides strong privacy assurance to users while retaining the capability to discover statistical patterns from sensitive data. We focus on top-k frequent itemset mining on sensitive data, with the goal of obtaining high result utility while satisfying differential privacy. There are two basic methodologies to design a high-utility solution: one uses generic differential privacy mechanisms as building blocks, and minimizes result error through algorithm design. Most existing work follows this approach. The other methodology is to devise a new building block customized for frequent itemset mining. This is much more challenging: to our knowledge, only one recent work, NoisyCut, attempts to do so, unfortunately, Noisycut has been found to violate differential privacy. This paper proposes a novel solution PrivSuper, which contains both a new algorithm and a new differential privacy mechanism. Unlike most existing methods that follow the Apriori framework, which starts from single items and iteratively forms larger itemsets, PrivSuper directly searches for maximal frequent itemsets, and subsequently adds their sub-itemsets to the results without additional privacy budget consumption. During the search, PrivSuper applies a customized mechanism to extend the current itemset with one more item, which we call the sequence exponential mechanism (SEM). Notably, SEM does not consume any privacy budget at all, if it turns out that the current itemset cannot be extended. Extensive experiments using several real datasets demonstrate that PrivSuper achieves significantly higher result utility compared to previous solutions.

Full Text