Multiple Instance Learning Model Research Articles

Abstract Background: PAM50, a 50-gene signature, classifies breast cancers into one of five subtypes (basal, luminal A, luminal B, HER2-enriched, and normal-like), revealing information about underlying tumor biology, and has emerged as a key prognostic indicator influencing treatment decisions. There is growing interest in bridging the gap between expression-based metrics and histopathology, where immunohistochemistry (IHC) and sequencing-based approaches have been proposed for this purpose. However, hematoxylin and eosin (H&E)-stained slides are ubiquitously utilized by pathologists for cancer diagnosis, while IHC and sequencing-based approaches require additional tissue and specialized processing and/or analysis. Here, we describe a computer vision-based approach to predict PAM50 classification using H&E-stained whole slide images (WSIs). Methods: We obtained expression-based PAM50 subtype labels and corresponding H&E-stained WSIs for 961 breast carcinomas from the TCGA BRCA cohort. We used two separate machine learning (ML) approaches to predict PAM50 subtypes from WSIs. In the first approach, we deployed previously trained PathExplore models to extract quantitative human-interpretable features (HIFs) that summarize the TME. We subsequently trained random forest classification models on these HIFs to predict PAM50 subtypes. For the second approach, we developed additive multiple instance learning (aMIL) models. Additionally, we explored the effects of PAM50 subtype labeling and aggregation strategies beyond the 5-class approach. Our 3-class approach combines Luminal A and B, as seen in IHC efforts to increase agreement with PAM50 assays, while excluding Normal, a category containing few and heterogeneous samples. We also performed binary classification for each subtype in the 3-class model (e.g. luminal vs. other). Slides were split into training (60%), validation (20%), and test (20%) sets, stratified by PAM50 labels, and model performance was assessed using the area under the receiver operator curve (AUROC) metric on the held-out test set, using a one vs. rest approach for multi-class models. To establish a baseline for PAM50 prediction, we developed random forest classification models using only clinical covariates (tumor stage, histologic grade, histological subtype, and BRCA1/2 status). Results: We compared the performance of our two ML models (HIF and aMIL) to that of the baseline model, and we report the AUROC values in Table 1. These models both performed well in predicting Basal, Luminal A, Luminal B, and Luminal (A+B), while the model performance was less strong for predictions of the HER2 and Normal classifications. The three-class model showed improved performance of predicting Luminal classifications relative to the five-class model that separates Luminal A and B. Although simplifying classification problems to a binary use case typically provides improved performance, this phenomenon was not observed for any of the PAM50 subtypes. Conclusions: These results demonstrate that AI-powered digital pathology can accurately and reproducibly perform molecular-based classification tasks, such as predicting PAM50 classifications, using WSIs, suggesting a more efficient path toward clinically relevant breast cancer characterization. Table 1. Performance of all models in predicting PAM50 molecular subtypes. AUROC values are shown. Shaded cells represent the best test-set performance for each class (row). Citation Format: Maria Guramare, Syed Ashar Javed, Christian Kirkup, Dinkar Juyal, Jacqueline Brosnan-Cashman, Victoria Mountain, Ryan Leung, Bahar Rahsepar, John Abel, Amaro Taylor-Weiner, Jake Conway. Prediction of PAM50 molecular subtypes from H&E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models [abstract]. In: Proceedings of the 2023 San Antonio Breast Cancer Symposium; 2023 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2024;84(9 Suppl):Abstract nr PO3-07-04.

Read full abstract

Abstract Background: Newly developed molecular technologies, such as spatial multiplexed assays and single-cell sequencing, have provided increased resolution and output for tumor analysis. However, these assays are often cost-prohibitive, making them inadequate ways to detect clinical biomarkers. In contrast, hematoxylin and eosin (H&E) staining is routine for cancer diagnostics but does not provide molecular information, potentially limiting its utility in the targeted therapy era. Machine learning models could augment the information revealed by H&E, potentially allowing molecular information to be inferred. Here, we describe a novel approach to predict gene expression signatures (GES) in H&E-stained whole slide images (WSI) using an additive multiple instance learning (aMIL) end-to-end model (1). We present results in breast cancer predicting spatially resolved levels of a TGFb GES, a proposed biomarker for TGFb antagonists and immunotherapy. Methods: H&E-stained WSI from the TCGA BRCA cohort (N=1090) were split into training (60%), validation (20%), and test (20%) sets. TGFb-CAF GES (2) were computed, and median expression cut-off on training data was used to define “high” and “low” TGFb-CAF levels. aMIL models were optimized in training data for the binary classification of TGFb-CAF levels. Top-performing model iterations were compared on the validation set, and the optimal model was deployed on the held-out test set. aMIL heatmaps were merged with PathExplore tumor microenvironment (TME) model heatmaps to characterize cell, tissue, and nuclear spatial distributions and morphology in terms of human interpretable features (HIFs). HIFs were extracted from high-importance patches (top 25% of aMIL scores) for both TGFb-CAF-high and -low. Results: Our model accurately predicted TGFb-CAF-high vs. -low BRCA samples (test AUROC=0.80). Also, model deployment on WSI provided interpretable heatmaps depicting TGFb-CAF predictions in tissue, providing spatial resolution to TGFb-CAF expression. Patches contributing most to TGFb-CAF-high prediction were enriched for cancer stroma, as well as cancer-infiltrating and stromal fibroblasts. Furthermore, significant differences in HIFs relating to fibroblast nucleus size and lymphocyte nucleus shape were observed between patches contributing most to TGFb-CAF-high and -low predictions. Conclusions: We have developed a method to predict GES with spatial resolution in H&E-stained WSI. aMIL models provide exact marginal contributions of each patch towards every class prediction, allowing downstream analysis of tissue, cell, and nuclear features and providing biological interpretability not found in typical black-box models. The ability of our method to detect GES in H&E-stained WSI allows complex molecular information to be detected in routine clinical specimens with spatial specificity, providing a means for GES to potentially be realized as clinical biomarkers.

Read full abstract

Multiple Instance Learning Model Research Articles

Related Topics

Articles published on Multiple Instance Learning Model

Weakly supervised multiple instance learning model with generalization ability for clinical adenocarcinoma screening on serous cavity effusion pathology

MYC Rearrangement Prediction From LYSA Whole Slide Images in Large B-Cell Lymphoma: A Multicentric Validation of Self-supervised Deep Learning Models

Unveiling the Power of Model-Agnostic Multiscale Analysis for Enhancing Artificial Intelligence Models in Breast Cancer Histopathology Images.

Weakly supervised deep learning image analysis can differentiate melanoma from naevi on haematoxylin and eosin-stained histopathology slides.

Multiple Instance Pathology Image Diagnosis Model based on Channel Attention and Data Augmentation

Interpretable artificial intelligence-based analysis for morphologic classification of neuroblastic tumors.

Abstract PO3-07-04: Prediction of PAM50 molecular subtypes from H&E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models

Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification.

Development and validation of a deep learning-based microsatellite instability predictor from prostate cancer whole-slide images

ProDiv: Prototype-driven consistent pseudo-bag division for whole-slide image classification

Boosting Multiple Instance Learning Models for Whole Slide Image Classification: A Model-Agnostic Framework Based on Counterfactual Inference

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.

Abstract B010: Spatially-resolved prediction of gene expression signatures in H&E whole slide images using additive multiple instance learning models

Abstract B016: AI analysis of histological images accurately identifies luminal subtype urothelial carcinomas characterized by high PPARG expression

An end-to-end approach to combine attention feature extraction and Gaussian Process models for deep multiple instance learning in CT hemorrhage detection

Attention2Minority: A salient instance inference-based multiple instance learning for classifying small lesions in whole slide images

HistoMIL: A Python package for training multiple instance learning models on histopathology slides

Data-Efficient Computational Pathology Platform for Faster and Cheaper Breast Cancer Subtype Identifications: Development of a Deep Learning Model.

Multiple Instance Learning with Trainable Soft Decision Tree Ensembles

Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multiple Instance Learning Model Research Articles

Related Topics

Articles published on Multiple Instance Learning Model

Weakly supervised multiple instance learning model with generalization ability for clinical adenocarcinoma screening on serous cavity effusion pathology

MYC Rearrangement Prediction From LYSA Whole Slide Images in Large B-Cell Lymphoma: A Multicentric Validation of Self-supervised Deep Learning Models

Unveiling the Power of Model-Agnostic Multiscale Analysis for Enhancing Artificial Intelligence Models in Breast Cancer Histopathology Images.

Weakly supervised deep learning image analysis can differentiate melanoma from naevi on haematoxylin and eosin-stained histopathology slides.

Multiple Instance Pathology Image Diagnosis Model based on Channel Attention and Data Augmentation

Interpretable artificial intelligence-based analysis for morphologic classification of neuroblastic tumors.

Abstract PO3-07-04: Prediction of PAM50 molecular subtypes from H&amp;E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models

Pseudo-Bag Mixup Augmentation for Multiple Instance Learning-Based Whole Slide Image Classification.

Development and validation of a deep learning-based microsatellite instability predictor from prostate cancer whole-slide images

ProDiv: Prototype-driven consistent pseudo-bag division for whole-slide image classification

Boosting Multiple Instance Learning Models for Whole Slide Image Classification: A Model-Agnostic Framework Based on Counterfactual Inference

Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole-Slide Image Classification.

Abstract B010: Spatially-resolved prediction of gene expression signatures in H&amp;E whole slide images using additive multiple instance learning models

Abstract B016: AI analysis of histological images accurately identifies luminal subtype urothelial carcinomas characterized by high PPARG expression

An end-to-end approach to combine attention feature extraction and Gaussian Process models for deep multiple instance learning in CT hemorrhage detection

Attention2Minority: A salient instance inference-based multiple instance learning for classifying small lesions in whole slide images

HistoMIL: A Python package for training multiple instance learning models on histopathology slides

Data-Efficient Computational Pathology Platform for Faster and Cheaper Breast Cancer Subtype Identifications: Development of a Deep Learning Model.

Multiple Instance Learning with Trainable Soft Decision Tree Ensembles

Domain-Specific Pre-training Improves Confidence in Whole Slide Image Classification.

Abstract PO3-07-04: Prediction of PAM50 molecular subtypes from H&E-stained breast cancer specimens using tumor microenvironment features and additive multiple instance learning models

Abstract B010: Spatially-resolved prediction of gene expression signatures in H&E whole slide images using additive multiple instance learning models