Abstract

The differential privacy k-means (DP k-means) clustering algorithm emerged to address the privacy protection challenges in the field of data mining. However, the algorithm encounters difficulties in achieving clustering usability and convergence. Privacy budget (ε), a critical parameter determining the noise addition in differential privacy algorithms, garners significant attention. Consequently, researchers have shifted their focus to studying privacy budget allocation strategies within the DP k-means clustering algorithm. However, the selection of a privacy budget allocation strategy in the DP k-means algorithm is an NP-hard problem. Our initial intuition is that genetic algorithms can efficiently discover relatively optimal privacy budget sequences. In this context, we propose a genetic algorithm-based privacy budget allocation strategy (GAPBAS) to ensure the convergence and usability of the DP k-means algorithm. Firstly, convergence is ensured by selecting improved initial centroids and rigorously controlling the minimum privacy budget for the DP k-means algorithm. Additionally, the privacy budget allocation strategy of the DP k-means algorithm is reformulated as a combinatorial optimization problem. This entails merging privacy budgets from multiple iterative rounds into a sequential sequence and utilizing a genetic algorithm to select the optimal privacy budget allocation strategy, thereby significantly enhancing the usability of the DP k-means algorithm. Comparative experiments against the other four privacy budget allocation strategies in the DP k-means algorithm demonstrate the superior performance of the genetic algorithm-based privacy budget allocation strategy at the same level of privacy protection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call