In this work, we propose a new privacy-preserving membership query protocol that lets a centralized entity privately query datasets held by one or more other parties to check if they contain a given element. This protocol, based on elliptic curve-based ElGamal and oblivious key-value stores, ensures that those 'data-augmenting' parties only have to send their encrypted data to the centralized entity once, making the protocol particularly efficient when the centralized entity repeatedly queries the same sets of data. We apply this protocol to detect anomalies in cross-silo federations. Data anomalies across such cross-silo federations are challenging to detect because (1) the centralized entities have little knowledge of the actual users, (2) the data-augmenting entities do not have a global view of the system, and (3) privacy concerns and regulations prevent pooling all the data. Our protocol allows for anomaly detection even in strongly separated distributed systems while protecting users' privacy. Specifically, we propose a cross-silo federated architecture in which a centralized entity (the backbone) has labeled data to train a machine learning model for detecting anomalous instances. The other entities in the federation are data-augmenting clients (the user-facing entities) who collaborate with the centralized entity to extract feature values to improve the utility of the model. These feature values are computed using our privacy-preserving membership query protocol. The model can be trained with an off-the-shelf machine learning algorithm that provides differential privacy to prevent it from memorizing instances from the training data, thereby providing output privacy. However, it is not straightforward to also efficiently provide input privacy, which ensures that none of the entities in the federation ever see the data of other entities in an unencrypted form. We demonstrate the effectiveness of our approach in the financial domain, motivated by the PETs Prize Challenge, which is a collaborative effort between the US and UK governments to combat international fraudulent transactions. We show that the private queries significantly increase the precision and recall of the otherwise centralized system and argue that this improvement translates to other use cases as well.
Read full abstract