Abstract Background: The tumor microenvironment is a key feature to understand cancer biology and may be used clinically. Quantification of tissue composition is usually based either on visual pathological review (VPR) or deconvolution of whole genome molecular data. Although the former is a direct measurement it has modest reproducibility while the latter is an indirect measurement of unclear accuracy, expensive and not always available. Here we test digital pathology coupled with machine learning as a new tool to assess tissue composition. Methods: As part of the Stratification in COloRecTal cancer (S:CORT) programme, a set of over 500 colorectal cancer (CRC) archival paraffin blocks from resections and biopsies were sequentially sectioned for Hematoxylin and Eosin staining (H&E), RNA extraction, a second H&E and DNA extraction. RNA expression microarrays, targeted DNA sequencing and DNA methylation arrays were applied. Tissue composition from the H&Es was obtained by VPR of expert pathologists and by a deep neural net (DNN) algorithm after supervised training on >1,500 tissue areas from S:CORT, TCGA, TEM and CORGI CRC cohorts. Tumor purity estimates were obtained from RNA and methylation arrays. Results: DNN estimates including area and cell counts were obtained for tumor, desmoplastic stroma, inflamed stroma, mucin/hypocellular stroma, muscle, necrosis and white space. An average of 6.8x105 total cells (range: 1.2x104-2.8x106) and 1.2x105 (range: 7.2x104-1.8x106) were classified for resections and biopsies respectively. Analyses performed twice on the same H&Es obtained matching results (r=1.0). Comparison of the paired first and second H&E showed very high correlations (r~0.9) and total cell counts correlated with DNA and RNA extraction yields (r~0.6). Tumor purity estimates by VPR mildly correlated with DNN (r~0.5) but they were underestimated and very variable. As a result, copy number adjusted by VPR purity tended to be overestimated compared to adjustment with DNN estimates. The improved performance of DNN is reflected in an accurate capture of non-linear association between area and cell counts in invasive cancer. In contrast, tumor purity estimates derived from RNA or DNA methylation arrays showed better correlations compared with DNN (r~0.6) but both overestimated purity in cases with low cell counts by up to a three-fold difference. Conclusions: Tissue composition analysis with DNN allows analytical robustness, automatization and standardization and provides very high reproducibility at single cell resolution. DNN-based estimation of tumor purity is more accurate than VPR or extrapolation from molecular data derived from genome-wide omic platforms which tend to under and overestimate tumor purity respectively. DNN could be used to better plan and asses downstream molecular analyses and to investigate tissue-based metrics as potential clinical biomarkers in clinical trials. Citation Format: Enric Domingo, Aikaterini Chatzipli, Susan Richman, Andrew Blake, Claire Hardy, Celina Whalley, Keara Redmon, Ian Tomlinson, Philip Dunne, Steven Walker, Andrew Beggs, Ultan McDermott, Graeme I. Murray, Leslie M. Samuel, Matthew Seymour, Philip Quirke, Tim Maughan, Viktor H. Koelzer. Assessment of tissue composition with digital pathology in colorectal cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 4446.
Read full abstract