Abstract Objectives To compare and evaluate the performance of artificial intelligence (AI) against physicians in classifying benign and malignant pulmonary nodules from computerized tomography (CT) images. Methods A total of 506 CT images with pulmonary nodules were retrospectively collected. The AI was trained using in-house software. For comparing the diagnostic performance of artificial intelligence and different groups of physicians in pulmonary nodules, statistical methods of receiver operating characteristic (ROC) curve and area under the curve (AUC) were analyzed. The nodules in CT images were analyzed in a case-by-case manner. Results The diagnostic accuracy of AI surpassed that of all groups of physicians, exhibiting an AUC of 0.88 alongside a sensitivity of 0.80, specificity of 0.84, and accuracy of 0.83. The area under the curve (AUC) of seven groups of physicians varies between 0.63 and 0.84. The sensitivity of the physicians within these groups varies between 0.4 and 0.76. The specificity of different groups ranges from 0.8 to 0.85. Furthermore, the accuracy of the seven groups ranges from 0.7 to 0.82. The professional insights for enhancing deep learning models were obtained through an examination conducted on a per-case basis. Conclusions AI demonstrated great potential in the benign–malignant classification of pulmonary nodules with higher accuracy. More accurate information will be provided by AI when making clinical decisions.
Read full abstract