Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption

Sergiu Carpov,Mariya Georgieva,Juan Ramon Troncoso-Pastoriza,Nicolas Gama

doi:10.1186/s12920-020-0723-0

Abstract

BackgroundPrivacy-preserving computations on genomic data, and more generally on medical data, is a critical path technology for innovative, life-saving research to positively and equally impact the global population. It enables medical research algorithms to be securely deployed in the cloud because operations on encrypted genomic databases are conducted without revealing any individual genomes. Methods for secure computation have shown significant performance improvements over the last several years. However, it is still challenging to apply them on large biomedical datasets.MethodsThe HE Track of iDash 2018 competition focused on solving an important problem in practical machine learning scenarios, where a data analyst that has trained a regression model (both linear and logistic) with a certain set of features, attempts to find all features in an encrypted database that will improve the quality of the model. Our solution is based on the hybrid framework Chimera that allows for switching between different families of fully homomorphic schemes, namely TFHE and HEAAN.ResultsOur solution is one of the finalist of Track 2 of iDash 2018 competition. Among the submitted solutions, ours is the only bootstrapped approach that can be applied for different sets of parameters without re-encrypting the genomic database, making it practical for real-world applications.ConclusionsThis is the first step towards the more general feature selection problem across large encrypted databases.

Highlights

Privacy-preserving computations on genomic data, and more generally on medical data, is a critical path technology for innovative, life-saving research to positively and impact the global population
We propose a solution to semi-parallel logistic regression on encrypted genomic data based on fully homomorphic encryption, that leverages on a novel framework, Chimera [15], to (a) seamlessly switch between different Ring-LWE-based ciphertext forms, combining the advantages of each of the existing Ring-LWE-based cryptosystems to perform each of the steps of the process in a more efficient way, and (b) is generic, in such a way that it can cope with arbitrary input sizes, and (c) features two configurations depending on the sought trade-off between accuracy and confidentiality
We describe below the main algorithms that are used for the TFHE with TRLWE encryption scheme, considering a security parameter λ = 128, and a minimal noise standard deviation α; these parameters implicitly define a minimal key size N ≈ max(256, 32α)

Summary

Introduction

Privacy-preserving computations on genomic data, and more generally on medical data, is a critical path technology for innovative, life-saving research to positively and impact the global population. In order to become feasible and usable for the purpose of personalized medicine, these protection mechanisms must optimize the trade-off between the accuracy of the results, the efficiency of the computation, and the security level. In this context, the iDASH Privacy and Security Workshop has joined together experts on privacy enhancing techniques, applied cryptography and secure computation to design and implement secure and privacy-preserving solutions to fundamental genomics and bioinformatics problems. Linear and logistic regressions are one of the most common and versatile machine learning tools used in genomic studies These are the core of Genome Wide-Association Studies (GWAS), and its privacy-preserving implementation represents a first step towards effective and efficient outsourced machine learning on genomic data

Methods

Results

Discussion

Conclusion