Early Development of a Machine Learning Approach to Quantify MYC Immunohistochemical Staining in Lymphoma

Bradley Drumheller,Leila Kutob,David Jaye,Elliott Burdette,Mohamed Amgad,Adam Perricone,Ahmed Aljudi,Conrad Shebelut,Cameron Neely

doi:10.1093/ajcp/aqaa137.034

Abstract

Abstract Newer data suggest that double expression of MYC and BCL2 proteins (DE) evaluated by quantitative immunohistochemistry (qIHC) may be a powerful marker of worse prognosis in diffuse large B cell lymphoma (DLBCL). Testing for DE status, defined as &gt;40% MYC+ and &gt;50% BCL2+ tumor cells, is recommended in the WHO 2016 classification and clinical trials are using DE scoring to assign therapy arms. However, other data suggest that significant variability in manual DE scoring diminishes the predictive value. Error sources include high interobserver variability (IOV) associated with field choice, discrimination of tumor immunoreactivity from adjacent non-neoplastic cells, cell-to-cell variability in staining intensity, crush artifacts and necrosis. Thus, there is a need for standardized, reproducible approaches for DE scoring by qIHC. To address this need, we have begun developing a novel machine-learning approach to analyze IHC digital pathology whole-slide images, focusing initially on MYC IHC. Digital whole-slide images (400x) of 22 DLBCL cases were uploaded to a web-based annotation platform. Using all cases, one annotator created 138 regions of interest (ROIs) containing approximately 200 nucleated cells and representing a variety of tissue types. Eight pathologists were assigned the same 10 ROIs in which to annotate all nuclei from which ground-truth seed nucleus labels (location, classification) were created for a validation set. Nuclei were classified as “tumor-positive”, “tumor-negative”, “non-tumor-positive”, “non-tumor-negative”, or “unknown”. This generated a set of 15,792 annotations with 1974 +/- 272 (Avg+/-STD) labels/annotator. Agglomerative hierarchical clustering afforded the creation of 2299 ground-truth seed locations. A maximum diameter of 3 mm/cluster was set by visual inspection of annotations. Of these seed locations, 1041 (45%) were detected by 8/8 annotators and, on average, 6/8 agreed on class. 302 +/- 72 (Avg+/-STD) “tumor positive” labels per annotator generated 382 seeds locations, 178 (47%) of which were detected by 8/8 annotators, with an average of 7.5/8 agreeing on class. 286 +/- 168 (Avg+/-STD) “tumor-negative” labels per annotator generated 336 seeds, 195 (58%) of which were detected by 8/8 annotators, with an average of 5/8 agreeing on class. Among all classes, the “tumor-positive” label displayed best overall label agreement whereas the “tumor-negative“ label yielded similar localization rate, but lower class agreement. These promising early findings provide a novel basis for quantifying IOV and utilizing multi-observer agreement to create a ground-truth validation set for a supervised machine learning approach to qIHC. Future efforts will make use of these data to optimize the validation set by rationally determining the number of additional annotations required, optimizing the number of annotators per ROI required, devising an adaptive approach to nuclear clustering based on nuclear density, and utilizing the additional 31,422 annotations in hand from all annotators as a robust algorithm training set.

Full Text