Abstract

AbstractBackgroundGenetically regulated gene expression as assessed via Transcriptome‐wide association studies (TWAS) can be used to elucidate the mechanistic basis Alzheimer’s Disease. A limitation of current TWAS approaches is the requirement for individual‐level datasets with both genotypes and gene expression measures to produce effective prediction models. We sought to improve the predictive performance of existing gene‐based models by leveraging easily accessible summary statistics that are powered by large sample sizes.MethodsBrain cis‐eQTL data produced from a large‐scale meta‐analysis was restricted to data from European‐descent individuals and samples taken from Brain‐Cortex (Average_N=2,547). The effect size estimates, and their standard errors from the individual cohorts (N=14) were used to re‐run a meta‐analysis excluding GTEx data via METAL.To perform gene‐based elastic‐net regressions using the produced summary statistics, the lassosum R package produced by Mak et al. 2017 was applied to genotype data from the 1,000 Genomes data (GRCh38) serving as a reference panel and GTEx_v8 genotype, normalized Brain‐Cortex gene expression, and covariate data (N=205) serving as a testing panel for model validation and dynamic selection of the optimal mixing/penalty parameters. The maximum R2 produced by this pipeline was compared to elastic‐net models generated by PredictDB using Brain‐Cortex GTEx_v8 data where the mixing parameter was set at 0.5.ResultsOur pipeline produced gene‐based models that had an improved predictive performance averaging 9.8% (SD: 7%) based on R2 over PredictDB models in 93.7% of genes tested (N=3,965) when restricting our pipeline to SNPs found in PredictDB. The optimal mixing parameter dramatically differed from PredictDB with a median of 0.009 highlighting the utility of including variants with small effects. Even restricting the mixing parameter to 0.5 for genes with a converged model, our pipeline outperformed PredictDB in 85% of genes (N=369). By examining expression prediction metrics to genes from the KEGG Alzheimer’s Disease pathway (hsa05010), our pipeline had improved performance in 100% of matched genes (N=30).ConclusionWe demonstrate improvement upon existing TWAS reference panels and increase gene‐expression predictive performance by leveraging the power of summary statistics. Our findings show AD association studies based on TWAS approaches may be improved by meta‐analysis reference panels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call