Abstract

Given a set of local sequential datasets held by multiple parties, we study the problem of publishing a synthetic dataset that preserves approximate sequentiality information of the integrated dataset while satisfying differential privacy for each local dataset. The existing solutions for publishing differentially private sequential data in the centralized setting mostly adopt tree-based approaches. Such approaches rely on different tree structures that encode sequential data’s statistical information. The construction of a tree structure is normally done by recursively splitting nodes whose noisy scores (e.g., entropy or count) are larger than a given threshold. However, extending similar ideas to the multi-party setting is challenging. First, the comparison between noisy scores and a given threshold needs to be done in a distributed manner without letting the parties know the noisy scores, while satisfying differential privacy for each local dataset. Second, in the multi-party setting the large number of node splitting decisions incurs prohibitive computation costs. In addressing the above challenges, we present DPST, a distributed prediction suffix tree construction solution. In DPST, we first introduce a novel node splitting decision method that calculates the comparison result under encryption with substantially improved efficiency. Then we present a novel batch-based tree construction approach to reduce the computation costs. In order to achieve high parallel performance without incurring any extra communication cost, we introduce the conjunction and slide methods to ensure that each batch contains a stable number of carefully arranged decision tasks. Extensive experiments on real datasets demonstrate that our DPST solution offers desirable data utility with low computation and communication costs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call