Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.

Dmitry Kolobkov,Satyarth Mishra Sharma,Aleksandr Medvedev,Mikhail Lebedev,Egor Kosaretskiy,Ruslan Vakhitov

doi:10.3389/fdata.2024.1266031

Abstract

Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Big Data	Publication Date: Feb 29, 2024
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Big Data

Lead the way for us

Similar Papers

MS12.04 The International Lung Cancer Consortium (ILCCO), an International Study to Identify Risk Factors for Lung Cancer Development
R.J Hung
Journal of Thoracic Oncology | VOL. 14
R.J HungR.J Hung
01 Oct 2019
MS12.04 The International Lung Cancer Consortium (ILCCO), an International Study to Identify Risk Factors for Lung Cancer Development
R.J Hung

Genetic Validation of Psoriasis Phenotyping in UK Biobank Supports the Utility of Self-Reported Data and Composite Definitions for Large Genetic and Epidemiological Studies
Jake R Saklatvala ... Nick Dand
Journal of Investigative Dermatology | VOL. 143
Jake R Saklatvala, et. al.Jake R Saklatvala ... Nick Dand
03 Mar 2023
Journal of Investigative Dermatology | VOL. 143

Association between ovarian reserve and spontaneous miscarriage and their shared genetic architecture.
Yan Yi ... Yanping Li
Human Reproduction | VOL. 38
Yan Yi, et. al.Yan Yi ... Yanping Li
15 Sep 2023
Human Reproduction | VOL. 38

Assessment of a causal relationship between body mass index and atopic dermatitis
Ashley Budu-Aggrey ... Sara J Brown
Journal of Allergy and Clinical Immunology | VOL. 147
Ashley Budu-Aggrey, et. al.Ashley Budu-Aggrey ... Sara J Brown
17 May 2020
Journal of Allergy and Clinical Immunology | VOL. 147

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Big Data