Abstract

In recent years, high-throughput technologies have produced enormous amounts of biomedical data. However, biomedical data tend to be “wide” rather than “big”, i.e., they typically contain a huge number of features for relatively few samples. To increase robustness and predictive power of machine learning models trained on biomedical data, it is hence desirable to increase sample sizes by jointly analyzing data sets available at several sites (e.g., hospitals). However, biomedical data is often extremely sensitive and so pooling it raises important privacy concerns. Consequently, federated learning (FL) has emerged as a privacy-aware alternative to centralized analyses on pooled data. In FL, sensitive data remain at the local sites and only model parameters are shared with a global aggregator. In my talk, I will introduce the promises and challenges of using FL for biomedical applications, using genome-wide association studies (GWAS) as an example. On the one hand, I will present recently developed tools that enable privacy-aware, federated GWAS in a user-friendly fashion while reaching high accuracies and practical runtimes. On the other hand, I will use the GWAS example to draw attention to a limitation of FL: Although, if detected, remaining privacy risks in FL can often be mitigated via algorithmic or cryptographic techniques, there are no mathematical frameworks which allow to oversee all possible sources of data leakage at the time of deployment. This suggests that it is unlikely that the conflict between privacy preservation and gained utility of pooled analyses of biomedical data will be fully resolvable via purely technological means in the foreseeable future. Instead, it might be necessary to develop risk-aware patient consent models, where the remaining risks of state-of-the-art privacy-aware machine learning technologies are made transparent and privacy preservation is no longer treated as a sine qua non condition but rather as an important good to be weighed against others.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call