Abstract

Privacy has become a major concern in data mining as it is utilized in many important applications. Distributed privacy-preserving data mining (DPPDM) is one of the techniques to address this concern, which focuses on protecting private information of members in distributed systems during data mining. As DPPDM is widely discussed in recent works, the semi-supervised manner of learning still draws less attention in this field. In this paper, a mixture-model-based semi-supervised DPPDM method is proposed. By introducing our method, a site in a distributed system is able to initiate a learning process using labeled data of its own and unlabeled data from all the sites. During the process, no individual data of any site is revealed to others, no information about data can be traced back to any specific site, and only the initiating site learns the result. We propose a parameter-masking privacy-preserving Expectation-Maximization (EM) algorithm and a mixture-model-based semi-supervised learning algorithm as the two main steps of our method. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call