Background Radiology practices have a high volume of unremarkable chest radiographs and artificial intelligence (AI) could possibly improve workflow by providing an automatic report. Purpose To estimate the proportion of unremarkable chest radiographs, where AI can correctly exclude pathology (ie, specificity) without increasing diagnostic errors. Materials and Methods In this retrospective study, consecutive chest radiographs in unique adult patients (≥18 years of age) were obtained January 1-12, 2020, at four Danish hospitals. Exclusion criteria included insufficient radiology reports or AI output error. Two thoracic radiologists, who were blinded to AI output, labeled chest radiographs as "remarkable" or "unremarkable" based on predefined unremarkable findings (reference standard). Radiology reports were classified similarly. A commercial AI tool was adapted to output a chest radiograph "remarkableness" probability, which was used to calculate specificity at different AI sensitivities. Chest radiographs with missed findings by AI and/or the radiology report were graded by one thoracic radiologist as critical, clinically significant, or clinically insignificant. Paired proportions were compared using the McNemar test. Results A total of 1961 patients were included (median age, 72 years [IQR, 58-81 years]; 993 female), with one chest radiograph per patient. The reference standard labeled 1231 of 1961 chest radiographs (62.8%) as remarkable and 730 of 1961 (37.2%) as unremarkable. At 99.9%, 99.0%, and 98.0% sensitivity, the AI had a specificity of 24.5% (179 of 730 radiographs [95% CI: 21, 28]), 47.1% (344 of 730 radiographs [95% CI: 43, 51]), and 52.7% (385 of 730 radiographs [95% CI: 49, 56]), respectively. With the AI fixed to have a similar sensitivity as radiology reports (87.2%), the missed findings of AI and reports had 2.2% (27 of 1231 radiographs) and 1.1% (14 of 1231 radiographs) classified as critical (P = .01), 4.1% (51 of 1231 radiographs) and 3.6% (44 of 1231 radiographs) classified as clinically significant (P = .46), and 6.5% (80 of 1231) and 8.1% (100 of 1231) classified as clinically insignificant (P = .11), respectively. At sensitivities greater than or equal to 95.4%, the AI tool exhibited less than or equal to 1.1% critical misses. Conclusion A commercial AI tool used off-label could correctly exclude pathology in 24.5%-52.7% of all unremarkable chest radiographs at greater than or equal to 98% sensitivity. The AI had equal or lower rates of critical misses than radiology reports at sensitivities greater than or equal to 95.4%. These results should be confirmed in a prospective study. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Yoon and Hwang in this issue.
Read full abstract