Abstract
Fuzzy C-means clustering algorithm is one of the typical clustering algorithms in data mining applications. However, due to the sensitive information in the dataset, there is a risk of user privacy being leaked during the clustering process. The fuzzy C-means clustering of differential privacy protection can protect the user’s individual privacy while mining data rules, however, the decline in availability caused by data disturbances is a common problem of these algorithms. Aiming at the problem that the algorithm accuracy is reduced by randomly initializing the membership matrix of fuzzy C-means, in this paper, the maximum distance method is firstly used to determine the initial center point. Then, the gaussian value of the cluster center point is used to calculate the privacy budget allocation ratio. Additionally, Laplace noise is added to complete differential privacy protection. The experimental results demonstrate that the clustering accuracy and effectiveness of the proposed algorithm are higher than baselines under the same privacy protection intensity.
Highlights
Data mining is used to extract some potentially useful information from a large amount of valid information
In order to solve the above problems, this paper proposes a privacy budget allocation method based on the gaussian kernel function and applies it to the fuzzy Cmeans algorithm to ensure the availability of clustered data while solving the problem of privacy leakage
The core idea of this algorithm is that in the iteration of fuzzy C-means clustering, the privacy budget allocation method based on gaussian weight is adopted to realize differential privacy protection for each cluster center point
Summary
Data mining is used to extract some potentially useful information from a large amount of valid information. In order to solve the above problems, this paper proposes a privacy budget allocation method based on the gaussian kernel function and applies it to the fuzzy Cmeans algorithm to ensure the availability of clustered data while solving the problem of privacy leakage. It provides a theoretical guarantee for users to use fuzzy C-means, which can promote the great research and wide application of fuzzy C-means in academic and industry. The fuzzy C-means algorithm main steps are: Input: dataset D 1⁄4 fxigni1⁄41, k Output: U and C
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have