Abstract

Here we train cis-regulatory models of prostate tissue gene expression and impute expression transcriptome-wide for 233,955 European ancestry men (14,616 prostate cancer (PrCa) cases, 219,339 controls) from two large cohorts. Among 12,014 genes evaluated in the UK Biobank, we identify 38 associated with PrCa, many replicating in the Kaiser Permanente RPGEH. We report the association of elevated TMPRSS2 expression with increased PrCa risk (independent of a previously-reported risk variant) and with increased tumoral expression of the TMPRSS2:ERG fusion-oncogene in The Cancer Genome Atlas, suggesting a novel germline-somatic interaction mechanism. Three novel genes, HOXA4, KLK1, and TIMM23, additionally replicate in the RPGEH cohort. Furthermore, 4 genes, MSMB, NCOA4, PCAT1, and PPP1R14A, are associated with PrCa in a trans-ethnic meta-analysis (N = 9117). Many genes exhibit evidence for allele-specific transcriptional activation by PrCa master-regulators (including androgen receptor) in Position Weight Matrix, Chip-Seq, and Hi-C experimental data, suggesting common regulatory mechanisms for the associated genes.

Highlights

  • IntroductionTo estimate genetically regulated expression among the study subjects, we developed novel models using a large number of samples (N = 471 subjects; Fig. 1a) with paired prostate tissue gene expression measurements and germline genotypes[5]

  • Training and validation of novel prostatic expression models

  • We found that, when considering 101 T2E-positive specimens carrying the gene fusion, predicted levels of TMPRSS2 expression in normal prostate tissue were positively correlated with observed ERG expression levels as measured by RNA-Seq, a proxy for the expression levels of the T2E fusion (Pearson’s r [95% CI] = 0.208 [0.013, 0.387], Linear Regression p = 0.037; residual ShapiroWilks p = 0.138, Fig. 3b)

Read more

Summary

Introduction

To estimate genetically regulated expression among the study subjects, we developed novel models using a large number of samples (N = 471 subjects; Fig. 1a) with paired prostate tissue gene expression measurements and germline genotypes[5]. These models improve upon the commonly used Genotype-Tissue Expression (GTEx, v6p) dataset[6], which includes many fewer prostate samples (N = 87). Our restriction of modeled genotypes to variants within 500 kb of gene boundaries rather than 1 Mb, as implemented by PredictDB6, gave a similar out-of-sample predictive accuracy of TCGA normal expression (mean Spearman’s rho = 0.077 vs 0.074, t-test p = 0.22; Supplementary Fig. 3)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.