Improved Segmentation and Detection Sensitivity of Diffusion-weighted Stroke Lesions with Synthetically Enhanced Deep Learning.

Christian Federau,Maarten Lansberg,Victor Schulze-Zachau,Hanns-Christian Breit,Sebastian Kozerke,Johanna M Ospel,Soren Christensen,Noemi Schmidt,Julian Maclaren,Nino Scherrer

doi:10.1148/ryai.2020190217

Abstract

To compare the segmentation and detection performance of a deep learning model trained on a database of human-labeled clinical stroke lesions on diffusion-weighted (DW) images to a model trained on the same database enhanced with synthetic stroke lesions. In this institutional review board-approved study, a stroke database of 962 cases (mean patient age ± standard deviation, 65 years ± 17; 255 male patients; 449 scans with DW positive stroke lesions) and a normal database of 2027 patients (mean age, 38 years ± 24; 1088 female patients) were used. Brain volumes with synthetic stroke lesions on DW images were produced by warping the relative signal increase of real strokes to normal brain volumes. A generic three-dimensional (3D) U-Net was trained on four different databases to generate four different models: (a) 375 neuroradiologist-labeled clinical DW positive stroke cases (CDB); (b) 2000 synthetic cases (S2DB); (c) CDB plus 2000 synthetic cases (CS2DB); and (d) CDB plus 40 000 synthetic cases (CS40DB). The models were tested on 20% (n = 192) of the cases of the stroke database, which were excluded from the training set. Segmentation accuracy was characterized using Dice score and lesion volume of the stroke segmentation, and statistical significance was tested using a paired two-tailed Student t test. Detection sensitivity and specificity were compared with labeling done by three neuroradiologists. The performance of the 3D U-Net model trained on the CS40DB (mean Dice score, 0.72) was better than models trained on the CS2DB (Dice score, 0.70; P < .001) or the CDB (Dice score, 0.65; P < .001). The deep learning model (CS40DB) was also more sensitive (91% [95% confidence interval {CI}: 89%, 93%]) than each of the three human readers (human reader 3, 84% [95% CI: 81%, 87%]; human reader 1, 78% [95% CI: 75%, 81%]; human reader 2, 79% [95% CI: 76%, 82%]), but was less specific (75% [95% CI: 72%, 78%]) than each of the three human readers (human reader 3, 96% [95% CI: 94%, 98%]; human reader 1, 92% [95% CI: 90%, 94%]; human reader 2, 89% [95% CI: 86%, 91%]). Deep learning training for segmentation and detection of stroke lesions on DW images was significantly improved by enhancing the training set with synthetic lesions.Supplemental material is available for this article.© RSNA, 2020.

Full Text