Abstract

Although deep learning (DL) has demonstrated impressive diagnostic performance for a variety of computational pathology tasks, this performance often markedly deteriorates on whole slide images (WSI) generated at external test sites. This phenomenon is due in part to domain shift, wherein differences in test-site pre-analytical variables (e.g., slide scanner, staining procedure) result in WSI with notably different visual presentations compared to training data. To ameliorate pre-analytic variances, approaches such as CycleGAN can be used to calibrate visual properties of images between sites, with the intent of improving DL classifier generalizability. In this work, we present a new approach termed Multi-Site Cross-Organ Calibration based Deep Learning (MuSClD) that employs WSIs of an off-target organ for calibration created at the same site as the on-target organ, based off the assumption that cross-organ slides are subjected to a common set of pre-analytical sources of variance. We demonstrate that by using an off-target organ from the test site to calibrate training data, the domain shift between training and testing data can be mitigated. Importantly, this strategy uniquely guards against potential data leakage introduced during calibration, wherein information only available in the testing data is imparted on the training data. We evaluate MuSClD in the context of the automated diagnosis of non-melanoma skin cancer (NMSC). Specifically, we evaluated MuSClD for identifying and distinguishing (a) basal cell carcinoma (BCC), (b) in-situ squamous cell carcinomas (SCC-In Situ), and (c) invasive squamous cell carcinomas (SCC-Invasive), using an Australian (training, n=85) and a Swiss (held-out testing, n=352) cohort. Our experiments reveal that MuSCID reduces the Wasserstein distances between sites in terms of color, contrast, and brightness metrics, without imparting noticeable artifacts to training data. The NMSC-subtyping performance is statistically improved as a result of MuSCID in terms of one-vs. rest AUC: BCC (0.92vs 0.87, p=0.01), SCC-In Situ (0.87vs 0.73, p=0.15) and SCC-Invasive (0.92vs 0.82, p=1e-5). Compared to baseline NMSC-subtyping with no calibration, the internal validation results of MuSClD (BCC (0.98), SCC-In Situ (0.92), and SCC-Invasive (0.97)) suggest that while domain shift indeed degrades classification performance, our on-target calibration using off-target tissue can safely compensate for pre-analytical variabilities, while improving the robustness of the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.