Abstract

Despite the increasing attention to big data, there are several domains where labeled data is scarce or too costly to obtain. For example, for data from information retrieval, gene analysis, and social network analysis, only training samples from the positive class are annotated while the remaining unlabeled training samples consist of both unlabeled positive and unlabeled negative samples. The specific positive and unlabeled (PU) data from those domains necessitates a mechanism to learn a two-class classifier from only one-class labeled data. Moreover, because data from those domains is highly sensitive and private, preserving training samples privacy is essential. This paper addresses the challenge of private PU learning by designing a differentially private algorithm for positive and unlabeled data. We first propose a learning framework for the PU setting when the class prior probability is known, with a theoretical guarantee of convergence to the optimal classifier. We then propose a privacy-preserving mechanism for the designed framework where the privacy and utility are both theoretically and empirically proved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call