Abstract

Abstract Background: There are well established disparities, related to ethnicity, in the presentation, distribution of subtypes and prognosis of breast cancer. To understand the possible biological bases for these differences, it is important to determine in an unbiased manner the ethnicity/geographical origin of patients presenting with breast cancer. We have downloaded Affymetrix SNP 6.0 data generated by The Cancer Genome Atlas project (TCGA) and have extracted ethnically informative SNPs to infer the ethnicity/geographical origin of these patients. Materials and Methods: The level one SNP data (Affymetrix SNP Array 6.0) for 536 normal DNAs from IBC patients as well as other relevant data were downloaded from the TCGA project data portal (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp). 103 samples were excluded for technical reasons (lack of subtype data or technical outliers). Bioconductor package “crlmm” was used for genotype calling. The HapMap PCA loadings for Affymetrix SNP probes were downloaded from http://www.stats.ox.ac.uk/~davison/software/shellfish/shellfish.php. Probe annotations were extracted from the Affymetrix GenomeWideSNP_6_na30.annot.csv file. “EIGINSTRAT” was used for Principle Component Analysis (PCA). Samples with unknown ethnicity were manually assigned based on their HapMap classification. Results: Samples were clustered and ethnicity inferred for three major ethnic groups (White, African and Asian) based on 168,905 informative SNPs. These clusters were generally in agreement with the clinical assignment of White, Black or African American and Asian. 10 samples, clinically assigned as Black or Africa American, were placed between the White and African clusters based on the HapMap clustering suggesting admixture. This contrasts other large breast cancer cohorts such as Curtis, et al. 2012, where there were virtually no African genotype patients and there was evident admixture on the European – Asian axis. 113 samples with unknown ethnicity were assigned based on their HapMap classification. Finally there are 371 Whites, 25 African, 20 Asians, and 10 Admixtures with subtype information available. Subtype distribution were LumA 176 (46.68%), LumB 90 (23.87%), Her2 37 (9.81%), Basal 68 (18.04%) in White group; LumA 15 (33.33%), LumB 8 (17.78%), Her2 9 (20%), Basal 13 (28.89%) in non-Whites. The subtype distribution between White and non-White populations was significantly different (p < 0.05, chi-square test), 7 Normal-like samples were excluded due to the low count. The Basal subtype had higher frequency among Africans compared to other ethnic groups. The LumB subtype also showed higher frequency in the 10 admixture samples, half of which were of the LumB subtype. Discussion: Inferring ethnicity/geographical origin by genotype removes bias known to occur in self reporting. We have successfully classified the TCGA breast cancer cohort into different ethnicity groups based on their ethnically informative SNP data. Although the SNP-based classification is generally in agreement with the clinical assignment, some samples are identified as admixtures between White and African. The subtype distribution varies by ethnicity/geographical origin and in the admixed population. Citation Information: Cancer Res 2012;72(24 Suppl):Abstract nr P3-09-03.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call