Creating artificial human genomes using generative neural networks.

Burak Yelmen,Aurélien Decelle,Corentin Tallec,Luca Pagani,Cyril Furtlehner,Francesco Montinaro,Linda Ongaro,Flora Jay,Davide Marnetto

doi:10.1371/journal.pgen.1009303

Abstract

Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation in the field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. In this study, we demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss. We show that our generated AGs replicate characteristics of the source dataset such as allele frequencies, linkage disequilibrium, pairwise haplotype distances and population structure. Moreover, they can also inherit complex features such as signals of selection. To illustrate the promising outcomes of our method, we showed that imputation quality for low frequency alleles can be improved by data augmentation to reference panels with AGs and that the RBM latent space provides a relevant encoding of the data, hence allowing further exploration of the reference dataset and features for solving supervised tasks. Generative models and AGs have the potential to become valuable assets in genetic studies by providing a rich yet compact representation of existing genomes and high-quality, easy-access and anonymous alternatives for private databases.

Highlights

Availability of genetic data has increased tremendously due to advances in sequencing technologies and reduced costs [1]
We demonstrated that deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) can be trained to learn the complex distributions of real genomic datasets and generate novel high-quality artificial genomes (AGs) with none to little privacy loss
We created AGs with GAN, RBM, and two simple generative models for comparison: a Bernoulli and a Markov chain model using 2504 individuals (5008 haplotypes) from 1000 Genomes data [24], spanning 805 SNPs from all chromosomes which reflect a high proportion of the population structure present in the whole dataset [25]

Summary

Introduction

Availability of genetic data has increased tremendously due to advances in sequencing technologies and reduced costs [1]. The vast amount of human genetic data is used in a wide range of fields, from medicine to evolution. Cost is still a limiting factor and more data is always welcome, especially in population genetics and genome-wide association studies (GWAS) which usually require substantial amounts of samples. Related to the costs and to the research bias toward studying populations of European ancestry, many autochthonous populations are under-represented in genetic databases, diminishing the extent of the resolution in many studies [2,3,4,5]. The majority of the data held by government institutions and private companies is considered sensitive and not accessible due to privacy issues, exhibiting yet another barrier for scientific work. A class of machine learning methods called generative models might provide a suitable solution to these problems

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Genetics	Publication Date: Feb 4, 2021
Citations: 71	License type: CC BY 4.0

R Discovery Prime

Creating artificial human genomes using generative neural networks.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLOS Genetics

Lead the way for us

Similar Papers

Creating artificial human genomes using generative neural networks
Aurélien Decelle ... Sara Mathieson
-
Aurélien Decelle, et. al.Aurélien Decelle ... Sara Mathieson
04 Feb 2021
04 Feb 2021

Deep convolutional and conditional neural networks for large-scale genomic data generation.
Burak Yelmen ... Cyril Furtlehner
PLOS Computational Biology | VOL. 19
Burak Yelmen, et. al.Burak Yelmen ... Cyril Furtlehner
30 Oct 2023
PLOS Computational Biology | VOL. 19

Deep quantization generative networks
Diwen Wan ... Ling Shao
Pattern Recognition | VOL. 105
Diwen Wan, et. al.Diwen Wan ... Ling Shao
14 Mar 2020
Pattern Recognition | VOL. 105

Quantifying microstructures of earth materials using higher-order spatial correlations and deep generative adversarial networks
Hamed Amiri ... Oliver Plümper
Scientific reports | VOL. 13
Hamed Amiri, et. al.Hamed Amiri ... Oliver Plümper
31 Jan 2023
Scientific reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Creating artificial human genomes using generative neural networks.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: PLOS Genetics