Abstract Background: PAM (Prediction Analysis of Microarray) 50 is an established gene expression-based algorithm to classify breast tumors into basal-like, HER2-enriched, luminal A (LA), and luminal B (LB) subtypes. Clinical subtyping is mainly based on immunohistochemistry (IHC) assays of estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (Her2) and Ki67 classifying tumors into triple-negative (ER-/PR-/Her2-), Her2+ (ER-/PR-/Her2+), LA (ER+/Her2-/Ki67-), LB1 (ER+/Her2-/Ki67+) and LB2 (ER+/Her2+). These two subtyping methods do not completely match even on comparable subtypes. Nevertheless, the ER-balanced subset for gene-centering in PAM50 subtyping was selected based on clinical status. Here we explored the possibility of using principal component analyses and iterative PAM50 call to refine the selection of an ER balance subset to improve consistency between these methods focusing on LB calls which is more aggressive than LA tumors. Methods: Normalized gene expression data was obtained from TCGA research network for 712 primary tumors which had IHC status available for ER, PR and Her2. Since Ki67 status was not available LA and LB was discriminated for ER+ cases with Her2- and Her2+ respectively. In house RNA-Seq dataset had 118 primary tumors and were drawn from the Clinical Breast Care Project where breast cancer patients were consented using an IRB-approved protocol. Tumors were selected and processed by laser microdissection. RNA was extracted from tissues using the Illustra triplePrep kit (GE Healthcare). Paired-end mRNA sequencing was performed using the Illumina HiSeq platform. Sequenced reads were processed using PERL based pipeline utilizing PRINSEQ, GSNAP and HTSeq. Principal component analysis (PCA) was done using R. Wilcoxon rank sum test was used for statistical significance (p<0.05). Results: In both datasets, the PCA map grouping of cases does not perfectly reflect the clinical subtypes. This motivated us to select ER balance subset based on the PC1 separation and IHC subtype. The resulting PAM50 subtypes on PCA map distinguished Basal and LA as two well separated components. Using all of Basal and equal number of LA cases for ER balance subset for PAM50 resulted in increased LB call and a better consistency with IHC LB calls. Among 712 cases in TCGA LB numbers increased from 142 in initial PAM50 call to 203 in ER balanced refined PAM50 call. We noticed that there was significantly higher (p-value = 4.414e-11) MKI67 expression for the 39 cases switch from LA to LB between PAM50 calls. Similar trend was observed in our in-house dataset where majority of the IHC-LB1 cases was called as LB in PAM50. The new method increased LB call from 22 to 27 which in-turn increased consistency between molecular and clinical subtypes from 73 to 79 out of the total of 118 cases. Conclusion: We show that an iterative PAM50 call coupled with PCA for selection of ER balance set potentially enhanced the consistency of the LB calls with clinical subtyping and that the tumors switched from LA to LB have high MKI67 expression. The views expressed in this article are those of the author and do not reflect the official policy of the Department of Army/Navy/Air Force, the Department of Defense, or U.S. Government. Citation Format: Raj-Kumar P-K, Liu J, Kovatich AJ, Kvecher L, Shriver CD, Hu H. Use of principal component analyses to select ER-balanced subset for gene centering in PAM50 subtyping [abstract]. In: Proceedings of the 2017 San Antonio Breast Cancer Symposium; 2017 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2018;78(4 Suppl):Abstract nr P2-06-04.