Abstract P6-08-04: Creation of a robust algorithm utilizing minimal gene sets normalized against a reference gene set to identify triple-negative breast cancer (TNBC) subtypes

Rob S Seitz,Jennifer A Pietenpol,Xi Chen,Rebecca B Smith,Brian D Lehmann,David R Hout,Brian Z Ring,Stephan W Morris

doi:10.1158/1538-7445.sabcs14-p6-08-04

Abstract

Abstract Introduction: Treatment of TNBC has been challenging due to the absence of well-characterized molecular targets and the heterogeneity of the disease. Using TNBC gene expression (GE) microarray profiles, the Pietenpol group molecularly binned the malignancy into six distinct subtypes: two basal-like [BL1, BL2], two mesenchymal-like [M, MSL], an immunomodulatory [IM], and a luminal subtype expressing androgen receptor [LAR]). Importantly, subtype-specific TNBC cell lines exhibit different sensitivities to various targeted and conventional chemotherapies currently employed or under investigation for the treatment of TNBC (1). Background: The original TNBCtype algorithm was generated from a meta-analysis of existing GE from tumor tissue and clustering the data into the six subtypes listed above and a seventh "unclassified" subtype (1). To transition the test into the clinic, we have modified the method of classification by both reducing the number of signature genes and normalizing the data against a reference set of endogenous expressed genes. Methods: Gene set enrichment followed by shrunken centroid analysis were used for feature reduction, resulting in 258 genes used for model building. The IM class was excluded from the feature reduction analysis as it likely represents presence of immune infiltrates rather than a distinct tumor class. Linear regression, targeted minimum loss based estimation, random forest, and elastic-net regularized linear models were employed, with the latter giving the best fit with the least number of required genes. Models were created to identify each class individually or together using a multiclass model. Coefficient and cutoffs were established on a Robust Multichip Average (RMA) normalized TNBC training data set consisting of 14 cohorts (N=386) and then applied to a seven cohort validation data set (N=201). A reference gene set was chosen using three of the training cohorts with the criteria of low intra- and inter-cohort variation, as well as overall low coefficient of variation, low probe-to-probe variability, expression greater than the cohort mean, and functional diversity. New cutoffs were determined using these same three cohorts, and normalization with the reference genes was tested using an additional two cohorts from the training data. Results: In the RMA normalized validation data set all models showed significant classification, (Fisher exact test, P&lt;0.0001). Specificity for the individual class models ranged from 88% (M) to 95% (LAR), while the multiclass model resulted in a 12% misclassification error rate. As was seen in the initial clustering in the training data set, there was notable overlap between the subtyping of BL1 and M. On the two discovery cohorts normalized with the reference gene set, the specificity for the individual class models ranged from 90% (M) to 100% (LAR). For the multiclass model the misclassification error was 18%. Conclusions: These results indicate that information conveyed in the initial clustering algorithm can be similarly obtained in a single patient sample using a reduced gene set normalized to internal controls. Future work will determine the biologic and clinical utility of this assay for patient management. Citation Format: Rob S Seitz, David R Hout, Stephan W Morris, Rebecca B Smith, Brian D Lehmann, Xi Chen, Jennifer A Pietenpol, Brian Z Ring. Creation of a robust algorithm utilizing minimal gene sets normalized against a reference gene set to identify triple-negative breast cancer (TNBC) subtypes [abstract]. In: Proceedings of the Thirty-Seventh Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2014 Dec 9-13; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2015;75(9 Suppl):Abstract nr P6-08-04.

Full Text