"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To evaluate the performance of the winning machine learning (ML) models from the 2023 RSNA Abdominal Trauma Detection Artificial Intelligence Challenge. Materials and Methods The competition was hosted on Kaggle and took place between July 26, 2023, to October 15, 2023. The multicenter competition dataset consisted of 4,274 abdominal trauma CT scans in which solid organs (liver, spleen and kidneys) were annotated as healthy, low-grade or high-grade injury. Studies were labeled as positive or negative for the presence of bowel/mesenteric injury and active extravasation. In this study, performances of the 8 award-winning models were retrospectively assessed and compared using various metrics, including the area under the receiver operating characteristic curve (AUC), for each injury category. The reported mean values of these metrics were calculated by averaging the performance across all models for each specified injury type. Results The models exhibited strong performance in detecting solid organ injuries, particularly high-grade injuries. For binary detection of injuries, the models demonstrated mean AUC values of 0.92 (range:0.91-0.94) for liver, 0.91 (range:0.87-0.93) for splenic, and 0.94 (range:0.93-0.95) for kidney injuries. The models achieved mean AUC values of 0.98 (range:0.96-0.98) for high-grade liver, 0.98 (range:0.97-0.99) for high-grade splenic, and 0.98 (range:0.97-0.98) for high-grade kidney injuries. For the detection of bowel/mesenteric injuries and active extravasation, the models demonstrated mean AUC values of 0.85 (range:0.74-0.73) and 0.85 (range:0.79-0.89) respectively. Conclusion The award-winning models from the AI challenge demonstrated strong performance in the detection of traumatic abdominal injuries on CT scans, particularly high-grade injuries. These models may serve as a performance baseline for future investigations and algorithms. ©RSNA, 2024.
Read full abstract