PPEM: Privacy‐preserving EM learning for mixture models

Sharon X Lee,Kaleb L Leemaqz,Geoffrey J Mclachlan

doi:10.1002/cpe.5208

Abstract

SummaryPrivacy is becoming increasingly important in collaborative data analysis, especially those involving personal or sensitive information commonly arising from health and commercial settings. The aim of privacy preserving statistical algorithms is to allow inference to be drawn on the joint data without disclosing private data held by each party. This paper presents a privacy‐preserving expectation–maximization (PPEM) algorithm for carrying out maximum likelihood estimation of the parameters of mixture models. We address the scenario of horizontally partitioned data distributed among three or more parties. The PPEM algorithm is a two‐cycle iterative distributed algorithm for fitting mixture models under privacy‐preserving requirements. A distinct advantage of PPEM is that it does not require a trusted third party for cooperative learning, unlike most existing schemes that implement a master/slave hierarchy. By adopting a ring topology and adding random noises to messages before encryption, PPEM helps prevent information leakage in the case of corrupted parties. Furthermore, in contrast to existing works, which typically assume a Honest‐but‐Curious adversary, we consider the much stronger case of a Malicious adversary. For illustration, PPEM is applied to two of the most popular mixture models, namely, the normal mixture model (NMM) and t‐mixture model (tMM), and their effectiveness is analyzed through a security analysis. A real data example is also presented to evaluate the computational complexity and accuracy of PPEM relative to its non‐privacy‐preserving version.

Full Text