Abstract

Abstract Deep learning has revolutionized digital pathology, allowing for automatic analysis of hematoxylin and eosin (H&E) stained whole slide images (WSIs) for various tasks, such as the classification of cancers into clinical subtypes. In such analyses, WSIs are broken down into smaller images, hereafter called tiles, for efficient processing, with each tile encoded by a deep learning backbone. To reconstruct a slide level representation, a widely applied approach is to combine tile-level features using attention-based deep learning models for each downstream prediction task. These training strategies are (a) computationally intensive, (b) challenging to optimize, (c) sensitive to stain variations across datasets, and (d) require sufficiently large and labeled datasets for supervised training, which are not always available. We propose SAMpling of multiscale empirical distributions for LEarning Representations (SAMPLER), a fully statistical approach to generate WSI representations by encoding the empirical cumulative distribution function (CDF) of multiscale tile features. We evaluated this approach by training logistic regression classifiers from SAMPLER representations. SAMPLER-based classifiers were able to accurately separate subtypes of breast carcinoma (BRCA: AUC=0.911 ± 0.029), subtypes of non-small cell lung carcinoma (NSCLC: AUC=0.940±0.018), and subtypes of renal cell carcinoma (RCC: AUC=0.987±0.006) diagnostic slides of the cancer genome atlas (TCGA). Performance was similar to fully deep learning attention models but >100 times faster. We further validated out models on external test sets. Histopathological review confirms that SAMPLER-identified high attention tiles contain tumor morphological features specific to the tumor type, while low attention tiles contain fibrous stroma, blood, or tissue folding artifacts. SAMPLER is a fast and accurate approach for analyzing WSIs, with greatly improved scalability over attention methods to benefit digital pathology analysis. Citation Format: Patience Mukashyaka, Todd B. Sheridan, Ali Foroughi pour, Jeffrey H. Chuang. SAMPLER: Unsupervised representations of whole slide images for tumor phenotype prediction [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Translating Cancer Evolution and Data Science: The Next Frontier; 2023 Dec 3-6; Boston, Massachusetts. Philadelphia (PA): AACR; Cancer Res 2024;84(3 Suppl_2):Abstract nr B039.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call