Abstract

Abstract Around 10% of genetic predisposition for breast cancer is explained by mutations in high/moderate penetrance genes. The remaining proportion is explained by multiple common variants of relatively small effect. A subset of these variants has been identified mostly in Europeans and Asians; and combined into polygenic risk scores (PRS) to predict breast cancer risk. Our aim is to identify a subset of variants to improve breast cancer risk prediction in Hispanics/Latinas (H/Ls).Breast cancer patients were recruited at the Instituto Nacional de Enfermedades Neoplásicas in Peru, to be part of The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN). Women without a diagnosis of breast cancer from a pregnancy outcomes study conducted in Peru were included as controls. After quality control filters, genome-wide genotypes were available for 1,809 cases and 3,334 controls. Missing genotypes were imputed using the Michigan Imputation Server using individuals from 1000 Genomes Project as reference. Genotypes for 313 previously reported breast cancer associated variants and 2 Latin American specific single nucleotide polymorphisms (SNPs) were extracted from the data, using an imputation r2 filter of 30%. Feature selection techniques were used to identify the best subset of SNPs for breast cancer prediction in Peruvian women. We randomly split the PEGEN data by 4:1 ratio for training/validation and testing. Training/validation data were resampled and split in 3:1 ratio into training and validation sets. SNP ranking and selection were done by bootstrapping results from 100 resampled training and validation sets. PRS were built by adding counts of risk alleles weighted by previously reported beta coefficients. The Area Under the Curve (AUC) was used to estimate the prediction accuracy of subsets of SNPs selected with different techniques. Logistic regression was used to test the association between standardized PRS residuals (after adjustment for genetic ancestry) and breast cancer risk. Of the 315 reported variants, 274 were available from the imputed dataset. The full 274-SNP PRS was associated with an AUC of 0.63 (95%CI=0.59-0.66) in the PEGEN study. Using different feature selection methods, we found subsets of SNPs that were associated with AUC values between 0.65-0.69. The best method (AUC=0.69, 95%CI=0.66-0.72) included a subset of 98 SNPs. Sixty-eight SNPs were selected by all methods, including the protective SNP rs140068132 in the 6q25 region, which is associated with Indigenous American ancestry and the largest contribution to the AUC.We identified a subset of 98 SNPs from a previously identified breast cancer PRS that improves breast cancer risk prediction compared to the full set, in women of high Indigenous American ancestry from Peru. Replication in women from Mexico and Colombia, and H/Ls from the U.S will allow us to confirm these results. Citation Format: Valentina A. Zavala, Tatiana Vidaurre, Xiaosong Huang, Sandro Casavilca, Jeannie Navarro, Michelle A. Williams, Sixto Sanchez, Elad Ziv, Luis Carvajal-Carmona, Susan L. Neuhausen7, Bizu Gelaye, Laura Fejerman. Identification of optimal set of genetic variants from a previously reported polygenic risk score for breast cancer risk prediction in Latin American women [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3683.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call