Abstract

The transcriptome-wide association study (TWAS) has emerged as one of several promising techniques for integrating multi-scale 'omics' data into traditional genome-wide association studies (GWAS). Unlike GWAS, which associates phenotypic variance directly with genetic variants, TWAS uses a reference dataset to train a predictive model for gene expressions, which allows it to associate phenotype with variants through the mediating effect of expressions. Although effective, this core innovation of TWAS is poorly understood, since the predictive accuracy of the genotype-expression model is generally low and further bounded by expression heritability. This raises the question: to what degree does the accuracy of the expression model affect the power of TWAS? Furthermore, would replacing predictions with actual, experimentally determined expressions improve power? To answer these questions, we compared the power of GWAS, TWAS, and a hypothetical protocol utilizing real expression data. We derived non-centrality parameters (NCPs) for linear mixed models (LMMs) to enable closed-form calculations of statistical power that do not rely on specific protocol implementations. We examined two representative scenarios: causality (genotype contributes to phenotype through expression) and pleiotropy (genotype contributes directly to both phenotype and expression), and also tested the effects of various properties including expression heritability. Our analysis reveals two main outcomes: (1) Under pleiotropy, the use of predicted expressions in TWAS is superior to actual expressions. This explains why TWAS can function with weak expression models, and shows that TWAS remains relevant even when real expressions are available. (2) GWAS outperforms TWAS when expression heritability is below a threshold of 0.04 under causality, or 0.06 under pleiotropy. Analysis of existing publications suggests that TWAS has been misapplied in place of GWAS, in situations where expression heritability is low.

Highlights

  • High-throughput sequencing instruments have enabled the rapid profiling of transcriptomes (RNA expression of genes) [1,2,3,4], proteomes [5,6,7] and other ‘omics’ data [8,9,10]

  • The transcriptome-wide association study (TWAS) has improved genome-wide association studies (GWAS) by estimating the effect of each genetic variant on the activity level of genes related to disease

  • The effectiveness of TWAS is surprising because its estimates of gene expressions are very inaccurate, so we ask if a method using real expression data instead of estimates would perform better

Read more

Summary

Introduction

High-throughput sequencing instruments have enabled the rapid profiling of transcriptomes (RNA expression of genes) [1,2,3,4], proteomes (proteins) [5,6,7] and other ‘omics’ data [8,9,10]. These ‘omics’ provide insight into the intermediary effects of genotypes on endophenotypes, and can improve the ability of genome-wide association studies (GWAS) to find associations between genetic variants and disease phenotypes. TWAS has since achieved significant popularity and success in identifying the genetic basis of complex traits [21,22,23,24,25,26,27], inspiring similar protocols for other endophenotypes such as IWAS for images [28] and PWAS for proteins [29]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call