Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings.

Sohee Park,Joon Beom Seo,Jooae Choe,Woong Bae,Kyu-Hwan Jung,Kyung Hee Lee,Sang Min Lee

doi:10.1007/s00330-019-06532-x

Abstract

To investigate the feasibility of a deep learning-based detection (DLD) system for multiclass lesions on chest radiograph, in comparison with observers. A total of 15,809 chest radiographs were collected from two tertiary hospitals (7204 normal and 8605 abnormal with nodule/mass, interstitial opacity, pleural effusion, or pneumothorax). Except for the test set (100 normal and 100 abnormal (nodule/mass, 70; interstitial opacity, 10; pleural effusion, 10; pneumothorax, 10)), radiographs were used to develop a DLD system for detecting multiclass lesions. The diagnostic performance of the developed model and that of nine observers with varying experiences were evaluated and compared using area under the receiver operating characteristic curve (AUROC), on a per-image basis, and jackknife alternative free-response receiver operating characteristic figure of merit (FOM) on a per-lesion basis. The false-positive fraction was also calculated. Compared with the group-averaged observations, the DLD system demonstrated significantly higher performances on image-wise normal/abnormal classification and lesion-wise detection with pattern classification (AUROC, 0.985 vs. 0.958; p = 0.001; FOM, 0.962 vs. 0.886; p < 0.001). In lesion-wise detection, the DLD system outperformed all nine observers. In the subgroup analysis, the DLD system exhibited consistently better performance for both nodule/mass (FOM, 0.913 vs. 0.847; p < 0.001) and the other three abnormal classes (FOM, 0.995 vs. 0.843; p < 0.001). The false-positive fraction of all abnormalities was 0.11 for the DLD system and 0.19 for the observers. The DLD system showed the potential for detection of lesions and pattern classification on chest radiographs, performing normal/abnormal classifications and achieving high diagnostic performance. • The DLD system was feasible for detection with pattern classification of multiclass lesions on chest radiograph. • The DLD system had high performance of image-wise classification as normal or abnormal chest radiographs (AUROC, 0.985) and showed especially high specificity (99.0%). • In lesion-wise detection of multiclass lesions, the DLD system outperformed all 9 observers (FOM, 0.962 vs. 0.886; p < 0.001).

Full Text