On the cross-population generalizability of gene expression prediction models.

Kevin L Keys,Scott Huntsman,María G Contreras,Anna V Mikhaylova,Sandra Salazar,Noah Zaitlen,Christopher R Gignoux,Sam S Oh,Timothy A Thornton,Walter L Eckalbar,Marquitta J White,Donglei Hu,Jimmie C Ye,Jennifer R Elhawary,Esteban G Burchard,Joel Mefford,Angel C Y Mak,Celeste Eng,Michael A Lenoir,Andrew Dahl

doi:10.1371/journal.pgen.1008927

Abstract

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.

Highlights

In the last decade, large-scale genome-wide genotyping projects have enabled a revolution in our understanding of complex traits [1,2,3,4]
Modern transcriptome-wide association analysis tools leverage existing paired genotype-expression datasets by creating models to predict gene expression using genotypes. These predictive models enable researchers to perform cost-effective association tests with gene expression in independently genotyped samples. Most of these models use European reference data, and the extent to which gene expression prediction models work across populations is not fully resolved
We observe that these models predict gene expression worse than expected in a dataset of African-Americans when derived from European-descent individuals

Summary

Introduction

Large-scale genome-wide genotyping projects have enabled a revolution in our understanding of complex traits [1,2,3,4]. This explosion of genome sequencing data has spurred the development of new methods that integrate large genotype sets with additional molecular measurements such as gene expression. A recently popular integrative approach to genetic association analyses, known as a transcriptome-wide association study (TWAS) [5,6], leverages reference datasets such as the Genotype-Tissue Expression (GTEx) repository [7] or the Depression and Genes Network (DGN) [8] to link associated genetic variants with a molecular trait like gene expression. A TWAS is similar in spirit to the widely-known genome-wide association study (GWAS) but suffers less of a multiple testing burden and can potentially detect more associations as a result [5,6]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Genetics	Publication Date: Aug 14, 2020
Citations: 48	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

On the cross-population generalizability of gene expression prediction models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics

Lead the way for us

Similar Papers

On the cross-population generalizability of gene expression prediction models
Andrew W Dahl ... Angel C Y Mak
-
Andrew W Dahl, et. al.Andrew W Dahl ... Angel C Y Mak
14 Aug 2020
14 Aug 2020

Autosomal genetic control of human gene expression does not differ across the sexes.
Irfahan Kassam ... Jian Yang
Genome Biology | VOL. 17
Irfahan Kassam, et. al.Irfahan Kassam ... Jian Yang
01 Dec 2016
Genome Biology | VOL. 17

Abstract B065: A framework for transcriptome-wide association studies in breast cancer in diverse study populations
Arjun Bhattacharya ... Michael I Love
Cancer Epidemiology, Biomarkers & Prevention | VOL. 29
Arjun Bhattacharya, et. al.Arjun Bhattacharya ... Michael I Love
01 Jun 2020
Cancer Epidemiology, Biomarkers & Prevention | VOL. 29

Genetic Control of Left Atrial Gene Expression Yields Insights into the Genetic Susceptibility for Atrial Fibrillation.
Jeffrey Hsu ... David R Van Wagoner
Circulation: Genomic and Precision Medicine | VOL. 11
Jeffrey Hsu, et. al.Jeffrey Hsu ... David R Van Wagoner
01 Mar 2018
Circulation: Genomic and Precision Medicine | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the cross-population generalizability of gene expression prediction models.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics