Background: Magnetic resonance imaging (MRI) methods have become routine for measuring liver iron concentration (LIC) for hemoglobinopathy patients in high income countries but require specialised training to analyse the image data1. While the cost of outsourcing data analysis is acceptable to many patients or providers in high income countries, the majority of hemoglobinopathy patients reside in low-income countries. Automated LIC measurements using artificial intelligence (AI) or deep-learning-assessed (DLA) algorithms could provide the solution for globally affordable and reliable patient monitoring of LIC. Here we update information presented at the EHA Annual Congress 2022. Aims: The aim of this study was to evaluate the diagnostic performance and repeatability of an automated deep-learning-based medical device (DLA R2-MRI) for measuring LIC from MRI. Methods: The DLA R2-MRI device was assessed prospectively on 1395 eligible consecutive MRI datasets from 63 different scanners submitted for expert manual analysis using spin-density projection assisted (SDPA) R2-MRI (the reference standard) between August 2017 and July 2020. Informed consent was waived by the Human Research Ethics Committee. The aetiologies for iron overload reported by the radiologists submitting the image data were thalassemias (477), hereditary hemochromatosis (168), sickle cell disease (152), MDS (11), other (316), unknown (271). The bias and limits of agreement between the automated and manual measurements of LIC were assessed. In addition, the diagnostic performance was assessed using sensitivity and specificity analysis. The repeatability of the DLA R2-MRI device was assessed by recruiting 60 participants with informed consent (50 patients and 10 healthy controls) each being measured twice with DLA R2-MRI (time between visits: min:1 hour max:7 days). Limits of agreement, bias, and repeatability coefficients were determined using Bland Altman statistics2. Results: The distribution of LIC values measured by the reference method was 31.6% (LIC<3 mg Fe/g); 17.3% (3≤ LIC <5 mg Fe/g); 11.4% (5≤ LIC <7 mg Fe/g); 18.7% (7≤ LIC <15 mg Fe/g); 21.0% (LIC > 15 mg Fe/g). The automated LIC results from the DLA R2-MRI are plotted against the results from the reference standard in the Figure where the solid line is the line of equivalence.The geometric mean ratios of the automated LIC results from the DLA R2-MRI to the manually derived results from SDPA R2-MRI were 0.98 (95% CI 0.94 - 1.01) below 3 mg Fe/g dry tissue and 0.93 (95% CI 0.92 - 0.95) above 3 mg Fe/g dry tissue. The sensitivities and specificities of the automated DLA R2-MRI system for predicting LIC values by the SDPA R2-MRI method above clinically relevant thresholds3 of 3.0, 5.0, 7.0, and 15.0 mg Fe/g dry tissue are shown in the Table. 95% of the repeat measures of LIC by DLA R2-MRI had ratios that fall between 0.79 and 1.26 above 3 mg Fe/g dw and between 0.64 and 1.57 below 3 mg Fe/g dry tissue. - Clinically relevant threshold3(mg Fe/g dry tissue) Sensitivity [95% CI](%) Specificity [95% CI](%) 3.0 96 [94-97] 95 [92-98] 5.0 91 [89-94] 97 [95-99] 7.0 92 [90-95] 97 [95-98] 15.0 89 [85-93] 98 [98-99] Summary/Conclusion: The repeatability of the DLA R2-MRI method for LIC measurement is significantly better than that for liver biopsy methods4-6. While there is an overall bias between DLA R2-MRI and SDPA R2-MRI, the bias does not result in unacceptable sensitivities and specificities of DLA R2-MRI for predicting SDPA R2-MRI results above the clinically relevant LIC thresholds. However, the bias between the automated and manual methods indicates that the two techniques should not be used interchangeably.
Read full abstract