Abstract Inherited genetic variants are estimated to account for 30-35% of overall breast cancer risk. A few rare, highly penetrant, genetic determinants of risk in humans have been well defined. However, the actions of many common weakly penetrant breast cancer risk loci remain uncharacterized. Additionally, it is well established that endocrine factors in general, and estrogens in particular, influence breast cancer etiology. We are using the ACI rat model of 17β-estradiol (E2)-induced mammary cancer to parse the contributions of individual genetic risk variants to breast cancer susceptibility in a physiologically relevant context. ACI females develop mammary carcinomas at an incidence approaching 100% when exposed to physiological levels of E2, and these carcinomas share many features with luminal-type breast cancers in humans. In contrast, Brown Norway (BN) rats are highly resistant to E2-induced mammary cancer. Linkage analyses of progeny from intercrosses between susceptible ACI and resistant BN rats led to the identification of multiple quantitative trait loci for E2-induced mammary cancer. One such locus, Estrogen-induced mammary cancer 4 (Emca4), is the focus of the current investigation. We generated a series of novel congenic rat strains which carry BN alleles at distinct regions of interest across the Emca4 locus, introgressed onto the ACI genetic background. Characterization of mammary cancer phenotypes in the congenic strains facilitated fine resolution mapping of the Emca4 locus. These studies revealed that Emca4 is a complex locus harboring at least four interacting genetic determinants of risk, designated Emca4.1 – Emca4.4, and is orthologous to the 8q24.21 breast cancer risk locus in humans. To assess the relevance of the rat genetic data to human populations, novel machine learning methods were employed to generate risk prediction models using data from a human cohort. Genotype data for 76 SNPs located in the regions of the human genome orthologous to Emca4.1 – 4.4 were obtained from the Cancer Genetics Markers of Susceptibility case control population. Models generated from this data set were optimized with novel algorithms to identify a subset of 16 variants that significantly influenced the risk models. The best model distinguished breast cancer cases from controls with a remarkably high degree of accuracy for a model based on genotype (AUC = 0.6, P < 10-11 relative to random guessing). It is worth noting that the predictive power of this model arose from interactions between human SNPs. Our data show that Emca4 is a complex locus containing multiple interacting determinants of risk; variants in the orthologous 8q24.21 breast cancer risk locus in human interact to influence breast cancer risk as predicted by the rat model; and accounting for interactions between variants achieves a predictive power beyond what is observed with individual SNPs. We have demonstrated, for the first time, the ability to develop a multi-component genetic model in rats and test it in a human population. This illustrates the power of the rat model to elucidate the complex mechanisms through which common, weakly penetrant variants influence breast cancer risk in humans. Citation Format: Dennison KL, Chack A, Escanilla NS, Page D, Shull JD. A laboratory/machine learning based comparative genetics model accurately predicts breast cancer in a human cohort [abstract]. In: Proceedings of the 2017 San Antonio Breast Cancer Symposium; 2017 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2018;78(4 Suppl):Abstract nr P5-05-02.
Read full abstract