Automated pneumothorax triaging in chest X-rays in the New Zealand population using deep-learning algorithms.

Sijing Feng,Stuart Barnard,Ji Soo Kim,Qixiu Liu,Jason Yeoh,Simon Gordon,Damian Azzollini,Cheng‐Kai Jin,Ben Wilson,Aakash Patel,Gregory P Tarr,Amy Fong,Julian Jang‐Jaccard,Martin Urschler,Mikal Sarrafzadeh,Cameron Simmers,Sibghat Ullah Bazai,Eve Kim

doi:10.1111/1754-9485.13393

Abstract

The primary aim was to develop convolutional neural network (CNN)-based artificial intelligence (AI) models for pneumothorax classification and segmentation for automated chest X-ray (CXR) triaging. A secondary aim was to perform interpretability analysis on the best-performing candidate model to determine whether the model's predictions were susceptible to bias or confounding. A CANDID-PTX dataset, that included 19,237 anonymized and manually labelled CXRs, was used for training and testing candidate models for pneumothorax classification and segmentation. Evaluation metrics for classification performance included Area under the receiver operating characteristic curve (AUC-ROC), sensitivity and specificity, whilst segmentation performance was measured using mean Dice and true-positive (TP)-Dice coefficients. Interpretability analysis was performed using Grad-CAM heatmaps. Finally, the best-performing model was implemented for a triage simulation. The best-performing model demonstrated a sensitivity of 0.93, specificity of 0.95 and AUC-ROC of 0.94 in identifying the presence of pneumothorax. A TP-Dice coefficient of 0.69 is given for segmentation performance. In triage simulation, mean reporting delay for pneumothorax-containing CXRs is reduced from 9.8 ± 2 days to 1.0 ± 0.5 days (P-value < 0.001 at 5% significance level), with sensitivity 0.95 and specificity of 0.95 given for the classification performance. Finally, interpretability analysis demonstrated models employed logic understandable to radiologists, with negligible bias or confounding in predictions. AI models can automate pneumothorax detection with clinically acceptable accuracy, and potentially reduce reporting delays for urgent findings when implemented as triaging tools.

Full Text