Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis

Jan Rudolph,Najib Ben Khaled,Wolfgang G Kunz,Jens Ricke,Michael Ingrisch,Maximilian Jörgens,Nicola Fink,Bastian O Sabel,Julien Dinkel,Sophia Goller,Johannes Rueckel,Lena Trappmann,Vincent Schwarze,Vanessa Koliogiannis,Maximilian Fischer,Nabeel Mansour,Boj F Hoppe,Balthasar Schachtner

doi:10.1038/s41598-022-16514-7

Abstract

Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts’ reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published (“Nodule”: 0.780, “Infiltration”: 0.735, “Effusion”: 0.864). The classifier “Infiltration” turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jul 27, 2022
Citations: 7	License type: open-access

R Discovery Prime

R Discovery Prime

Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis

Abstract

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents
Joy T Wu ... Satyananda Kashyap
JAMA Network Open | VOL. 3
Joy T Wu, et. al.Joy T Wu ... Satyananda Kashyap
09 Oct 2020
JAMA Network Open | VOL. 3

Artificial Intelligence for Early Prediction of Pulmonary Hypertension Using Electrocardiography
J Kwon ... K Kim
The Journal of Heart and Lung Transplantation | VOL. 39
J Kwon, et. al.J Kwon ... K Kim
30 Mar 2020
The Journal of Heart and Lung Transplantation | VOL. 39

AI-based improvement in lung cancer detection on chest radiographs: results of a multi-reader study in NLST dataset.
Hyunsuk Yoo ... Sean Siebert
European Radiology | VOL. 31
Hyunsuk Yoo, et. al.Hyunsuk Yoo ... Sean Siebert
04 Jun 2021
European Radiology | VOL. 31

Comparative Performance of Artificial Intelligence Algorithms for Screening Mammography.
Michio Taya
Radiology. Imaging cancer | VOL. 2
Michio TayaMichio Taya
01 Nov 2020
Radiology. Imaging cancer | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis

Abstract

Talk to us

Similar Papers

More From: Scientific Reports