Abstract Funding Acknowledgements Type of funding sources: Other. Main funding source(s): J. Schwitter receives research support by “ Bayer Schweiz AG “. C.N.C. received grant by Siemens. Gianluca Pontone received institutional fees by General Electric, Bracco, Heartflow, Medtronic, and Bayer. U.J.S received grand by Astellas, Bayer, General Electric. This work was supported by Italian Ministry of Health, Rome, Italy (RC 2017 R659/17-CCM698). This work was supported by Gyrotools, Zurich, Switzerland. Background Late Gadolinium enhancement (LGE) scar quantification is generally recognized as an accurate and reproducible technique, but it is observer-dependent and time consuming. Machine learning (ML) potentially offers to solve this problem. Purpose to develop and validate a ML-algorithm to allow for scar quantification thereby fully avoiding observer variability, and to apply this algorithm to the prospective international multicentre Derivate cohort. Method The Derivate Registry collected heart failure patients with LV ejection fraction <50% in 20 European and US centres. In the post-myocardial infarction patients (n = 689) quality of the LGE short-axis breath-hold images was determined (good, acceptable, sufficient, borderline, poor, excluded) and ground truth (GT) was produced (endo-epicardial contours, 2 remote reference regions, artefact elimination) to determine mass of non-infarcted myocardium and of dense (≥5SD above mean-remote) and non-dense scar (>2SD to <5SD above mean-remote). Data were divided into the learning (total n = 573; training: n = 289; testing: n = 284) and validation set (n = 116). A Ternaus-network (loss function = average of dice and binary-cross-entropy) produced 4 outputs (initial prediction, test time augmentation (TTA), threshold-based prediction (TB), and TTA + TB) representing normal myocardium, non-dense, and dense scar (Figure 1).Outputs were evaluated by dice metrics, Bland-Altman, and correlations. Results In the validation and test data sets, both not used for training, the dense scar GT was 20.8 ± 9.6% and 21.9 ± 13.3% of LV mass, respectively. The TTA-network yielded the best results with small biases vs GT (-2.2 ± 6.1%, p < 0.02; -1.7 ± 6.0%, p < 0.003, respectively) and 95%CI vs GT in the range of inter-human comparisons, i.e. TTA yielded SD of the differences vs GT in the validation and test data of 6.1 and 6.0 percentage points (%p), respectively (Fig 2), which was comparable to the 7.7%p for the inter-observer comparison (n = 40). For non-dense scar, TTA performance was similar with small biases (-1.9 ± 8.6%, p < 0.0005, -1.4 ± 8.2%, p < 0.0001, in the validation and test sets, respectively, GT 39.2 ± 13.8% and 42.1 ± 14.2%) and acceptable 95%CI with SD of the differences of 8.6 and 8.2%p for TTA vs GT, respectively, and 9.3%p for inter-observer. Conclusions In the large Derivate cohort from 20 centres, performance of the presented ML-algorithm to quantify dense and non-dense scar fully automatically is comparable to that of experienced humans with small bias and acceptable 95%-CI. Such a tool could facilitate scar quantification in clinical routine as it eliminates human observer variability and can handle large data sets.
Read full abstract