Abstract Background: Although several gene expression-based assays are validated for informing prognosis and treatment decision-making for breast cancer (BC) patients, their uptake has been hampered by technical complexities and cost, particularly in underrepresented and low-resource settings. Here, we explored whether machine learning-based features on standard hematoxylin and eosin (H&E)-stained images can be used in conjunction with routinely available pathology data (pathomics) to infer clinically relevant tumor genomic assays among BC patients from a sub-Saharan African population. Methods: This study comprised 563 BC patients with diagnostic H&E-stained images, clinicopathological data, and Nanostring gene expression data from the Ghana Breast Health Study (GBHS), a population-based case-control study that recruited BC patients between 2013 and 2015. H&E images were analyzed using high-accuracy machine learning algorithms to extract data on several, prospectively selected, human interpretable imaging features. The assessed features encompassed characteristics of the tumor (e.g., average nuclear size, nuclear optical density, nuclear roundness, etc.) and stroma (e.g., degree of stromal cellularity, stromal cell phenotype, extent of stromal desmoplasia, remodeling, necrosis, etc.). Nanostring technology was used to generate data on PAM50 subtype [luminal (A and B), non-luminal (HER2-enriched and Basal), normal-like], 21-gene recurrence score (RS), and TP53 pathway function. Multivariable logistic regression models were fitted to luminal (vs other subtypes), non-luminal (vs other subtypes), TP53 mutant-like (vs wildtype-like), and RS (4th vs 1st quartile) data to develop predictive classifiers in a discovery set (60% of the data). The performance of the classifiers was then tested in a held-out internal validation set (40% of the data). Results: In the discovery set, the pathomics-based classifiers achieved varying but excellent discriminatory accuracy [AUROC = 0.90 (0.86-0.94), 0.94 (0.91-0.97), 0.95 (0.91-0.98), and 0.96 (0.94-0.99) for TP53, non-luminal, RS, and luminal classification, respectively]. In the held-out (validation) set, the corresponding AUROC values were 0.82 (0.75-0.88), 0.87 (0.81-0.92), 0.85 (0.77-0.92), and 0.88 (0.83-0.93) for TP53, non-luminal, RS, and luminal classification. The TP53 classifier was correlated with the luminal (R=-0.73), non-luminal (R=0.80), and RS (R=0.77) classifiers. The distribution of the pathomics-based TP53 probability score varied considerably between PAM50 subtypes (P<0.0001), and according to RS categories within both ER+ (P=0.009) and ER- (P=0.006) BC subtypes in the validation set. Conclusion: H&Es are cost-effective and routinely performed as part of the diagnostic workup for BC patients. Accordingly, the results open promising avenues for the use of interpretable, machine learning-based, H&E imaging and pathology data to infer breast tumor genomic signatures and prognosis in low-resource settings. Further work is required to validate findings in independent populations.: Citation Format: Mustapha Abubakar, Amber N. Hurson, Thomas U. Ahearn, Ebonee N. Butler, Alina M. Hamilton, Maire A. Duggan, Scott M. Lawrence, Ernest Adjei, Joe-Nat Clegg-Lamptey, Joel Yarney, Beatrice Wiafe-Addai, Baffour Awuah, Seth Wiafe, Kofi Nyarko, Francis Aitpillah, Daniel Ansong, Stephen Hewitt, Louise A. Brinton, Melissa A. Troester, Lawrence Edusei, Nicolas Titiloye, Jonine D. Figueroa, Montserrat Garcia-Closas. Pathomics-based classifiers for inferring breast cancer genomic assays and prognosis in sub-Saharan Africa: Results from the Ghana Breast Health Study [abstract]. In: Proceedings of the 17th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2024 Sep 21-24; Los Angeles, CA. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2024;33(9 Suppl):Abstract nr C022.
Read full abstract