Abstract
Due to significant data security concerns in machine learning, such as the data silo problem, there has been a growing trend towards the development of privacy-preserving machine learning applications. The initial step in training data across silos involves establishing secure data joins, specifically private data joins, to ensure the consistency and accuracy of the dataset. While the majority of current research focuses on the inner join of private data, this paper specifically addresses the privacy-preserving full join of private data and develops two-party unbalanced private data full join protocols utilizing secure multi-party computation tools. Notably, our paper introduces the novel component of Private Match-and-Connect (PMC), which performs a union operation on the ID and feature values, and ensure the secret sharing of the resulting union set. Each participant receives only a portion of the secret share, thereby guaranteeing data security during the pre-processing phase. Furthermore, we propose the two-party ID-private data union protocol (IDPriU), which facilitates secure and accurate matching of feature value shares and ID shares and also enables the data alignment. Our protocol represents a significant advancement in the field of privacy-preserving data preprocessing in machine learning and privacy-preserving federated queries. It extends the concept that private data joins are limited to inner connections, offering a novel approach by Private Set Union (PSU). We have experimentally implemented our protocol and obtained favorable results in terms of both runtime and communication overhead.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have