Abstract Background A major challenge in real-world echocardiography is the difficulty in obtaining high quality images in some patients or some clinical settings. Is AI only useful when image quality is good? Purpose To artificially degrade adequate-quality images, and compare the ability of human experts and AI, to make measurements correctly, as image degradation worsens. Methods PLAX dimension measurements were made on videos of 30 patients with a range LV dimensions (mean 138mm, SD 37mm). To set the gold standard, 9 experts measured each image, blinded to the measurements made by others. For each original image, 5 degraded versions were then made, each progressively more degraded. The degradation was designed to be maximally confusing for this measurement, namely the addition of faint ghost images of random PLAX views of other patients. This process is automatable, reproducible, and not easy to undo by conventional image processing techniques. The 30 cases, each in original form plus 5 degraded versions, totalled 180 images, They were presented in random order for labelling by a pool of experts. They were also analysed by the Unity UK Echocardiography AI collaboratives. Results An example degradation sequence is shown in Figure 1 (upper panel), with images cropped for this abstract. Across all measurements, the expert and AI suffered progressively greater measurement error as the level of image degradation increased. On average, the AI error was N smaller than the expert error (p<0.05). For example, for LV internal dimension, the progressive rise in human error was 2.2mm, 2.5mm, 3.1mm, 3.6mm, 5.3mm, 9.6 mm, p<0.001 for trend, Figure 1 lower panel grey bars. Meanwhile for the AI, the corresponding errors were: 2.5mm, 2.5 mm, 2.7 mm, 3 mm, 3.9 mm, 8 mm. (p<0.001 for trend, Figure 1 lower panel grey bars). The Minimum Heatmap Amplitude (MHA), an automatic index of AI confidence in its measurement, also declined progressively (p<0.001 for trend). Conclusion Both humans and AI alike make less accurate measurements as image quality degrades, although the deterioration in accuracy is more predictable for AI measurements. Importantly, deterioration in image quality (and therefore doubtfulness of the measurement) can be automatically quantified through the MHA to flag measurements needing special attention. Figure 1: Progressive degradation in images (upper panel) and corresponding increase in measurement error by both human experts and AI (lower panel)
Read full abstract