Crowd-Sourced Deep Learning for Intracranial Hemorrhage Identification: Wisdom of Crowds or Laissez-Faire.

E.I.S Hofmeijer,F Van Der Heijden,R Gupta,C.O Tan

doi:10.3174/ajnr.a7902

Abstract

Researchers and clinical radiology practices are increasingly faced with the task of selecting the most accurate artificial intelligence tools from an ever-expanding range. In this study, we sought to test the utility of ensemble learning for determining the best combination from 70 models trained to identify intracranial hemorrhage. Furthermore, we investigated whether ensemble deployment is preferred to use of the single best model. It was hypothesized that any individual model in the ensemble would be outperformed by the ensemble. In this retrospective study, de-identified clinical head CT scans from 134 patients were included. Every section was annotated with "no-intracranial hemorrhage" or "intracranial hemorrhage," and 70 convolutional neural networks were used for their identification. Four ensemble learning methods were researched, and their accuracies as well as receiver operating characteristic curves and the corresponding areas under the curve were compared with those of individual convolutional neural networks. The areas under the curve were compared for a statistical difference using a generalized U-statistic. The individual convolutional neural networks had an average test accuracy of 67.8% (range, 59.4%-76.0%). Three ensemble learning methods outperformed this average test accuracy, but only one achieved an accuracy above the 95th percentile of the individual convolutional neural network accuracy distribution. Only 1 ensemble learning method achieved a similar area under the curve as the single best convolutional neural network (Δarea under the curve = 0.03; 95% CI, -0.01-0.06; P = .17). None of the ensemble learning methods outperformed the accuracy of the single best convolutional neural network, at least in the context of intracranial hemorrhage detection.

Full Text