Abstract

Frequent sequence mining (FSM) is a fundamental component for analyzing sequential data in the big data era. However, collecting and analyzing sequence data incurs serious privacy issues for users. Local differential privacy (LDP) is a prevalent privacy paradigm for data statistics and analysis. Due to the high-dimensionality of sequence data, directly applying LDP on FSM faces severe challenges such as heavy perturbation, waste of privacy budget, poor accuracy, and high computational costs. This paper proposed PrivFSM and PrivFSM∗ which are two LDP-compliant FSM mechanisms with high data utility and low computational cost. Specifically, PrivFSM iteratively constructs sequences into a prefix tree under LDP to mine frequent sequences from huge domain. Besides, PrivFSM leverages a threshold-based pruning strategy to prune fake nodes, which reduces perturbations and computation costs. As an augmented version of PrivFSM, PrivFSM∗ no longer splits privacy budget, but partitions user set to avoid heavy noise. PrivFSM∗ maximizes the utilization of users’ data by applying the whole data on every iteration with a complete privacy budget. We conduct theoretical analysis on error-bound and extensive experiments on both real-world and synthetic datasets. The experimental results demonstrate the high data utility of our proposed algorithms when compared with other state-of-the-art mechanisms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call