Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study

Huan Yang,Chenghao Lin,Lili Chen,Zhiqiang Cheng,Jianbo Wang,Zunfu Ke,Yangshan Chen,Yuefeng Wang,Leilei Huang,Minglei Yang,Sui Peng,Weizhong Li

doi:10.1186/s12916-021-01953-2

Abstract

BackgroundTargeted therapy and immunotherapy put forward higher demands for accurate lung cancer classification, as well as benign versus malignant disease discrimination. Digital whole slide images (WSIs) witnessed the transition from traditional histopathology to computational approaches, arousing a hype of deep learning methods for histopathological analysis. We aimed at exploring the potential of deep learning models in the identification of lung cancer subtypes and cancer mimics from WSIs.MethodsWe initially obtained 741 WSIs from the First Affiliated Hospital of Sun Yat-sen University (SYSUFH) for the deep learning model development, optimization, and verification. Additional 318 WSIs from SYSUFH, 212 from Shenzhen People’s Hospital, and 422 from The Cancer Genome Atlas were further collected for multi-centre verification. EfficientNet-B5- and ResNet-50-based deep learning methods were developed and compared using the metrics of recall, precision, F1-score, and areas under the curve (AUCs). A threshold-based tumour-first aggregation approach was proposed and implemented for the label inferencing of WSIs with complex tissue components. Four pathologists of different levels from SYSUFH reviewed all the testing slides blindly, and the diagnosing results were used for quantitative comparisons with the best performing deep learning model.ResultsWe developed the first deep learning-based six-type classifier for histopathological WSI classification of lung adenocarcinoma, lung squamous cell carcinoma, small cell lung carcinoma, pulmonary tuberculosis, organizing pneumonia, and normal lung. The EfficientNet-B5-based model outperformed ResNet-50 and was selected as the backbone in the classifier. Tested on 1067 slides from four cohorts of different medical centres, AUCs of 0.970, 0.918, 0.963, and 0.978 were achieved, respectively. The classifier achieved high consistence to the ground truth and attending pathologists with high intraclass correlation coefficients over 0.873.ConclusionsMulti-cohort testing demonstrated our six-type classifier achieved consistent and comparable performance to experienced pathologists and gained advantages over other existing computational methods. The visualization of prediction heatmap improved the model interpretability intuitively. The classifier with the threshold-based tumour-first label inferencing method exhibited excellent accuracy and feasibility in classifying lung cancers and confused nonneoplastic tissues, indicating that deep learning can resolve complex multi-class tissue classification that conforms to real-world histopathological scenarios.

Highlights

Targeted therapy and immunotherapy put forward higher demands for accurate lung cancer classification, as well as benign versus malignant disease discrimination
Internal cohort testing A total of 741 lung-derived digital whole slide images (WSIs), consisting of 512 tumorous tissues, 130 inflammatory tissues, and 99 normal tissues from the First Affiliated Hospital of Sun Yat-sen University (SYSUFH), constituted the initial dataset and were randomly divided into the training (n = 511 slides), validation (n = 115 slides), and internal testing (SYSU1) (n = 115 slides) subsets (Table 2)
The WSIs for training and validation were annotated by experienced pathologists and reviewed by the head of the Pathology Department at SYSUFH, and only Region of interest (ROI) were extracted and tessellated into small 256- × 256-pixel tiles at × 20 magnification as inputs of the EfficientNet-B5 network

Summary

Introduction

Targeted therapy and immunotherapy put forward higher demands for accurate lung cancer classification, as well as benign versus malignant disease discrimination. Digital whole slide images (WSIs) witnessed the transition from traditional histopathology to computational approaches, arousing a hype of deep learning methods for histopathological analysis. We aimed at exploring the potential of deep learning models in the identification of lung cancer subtypes and cancer mimics from WSIs. Lung cancer is the leading killer-cancer worldwide and referred to either non-small cell lung cancer (NSCLC) or small cell lung cancer (SCLC) customarily. With the emerging targeted therapy and immunotherapy, accurate morphological classification is in urgent need [1]. The digitization of histopathological slides is shifting the way pathologists work and allowing artificial intelligence (AI) to integrate with traditional laboratory workflows

Objectives

Methods

Results

Discussion

Conclusion