Abstract Background/Introduction Cine cardiovascular magnetic resonance (CMR) is the gold-standard technique for the assessment of cardiac function and structure. Manual analysis of cine CMR to estimate structural and functional biomarkers is costly and time-consuming, motivating the use of artificial intelligence (AI) for automation. However, the scan-rescan repeatability of AI derived biomarkers remains unknown. Purpose Measure the repeatability of AI-based cine CMR biomarkers in scan-rescan data. Methods 92 scan-rescan short-axis cine CMR were acquired from volunteers on the same day (i.e. 184 in total) and annotated with ground truth (GT) manual segmentations of the left and right ventricle (LV, RV) blood pools, and LV myocardium at end-diastole (ED) and end-systole (ES). AI segmentations and subsequent biomarker estimates were produced using the AI-CMR-QC tool described in [1]. Segmentation performance was assessed with Dice score, pixelwise sensitivity and specificity, and Hausdorff distance. Scan-rescan differences in cardiac biomarkers for each subject were used to assess repeatability. Repeatability of LV ejection fraction (EF) based classification (LVEF<40 = abnormal, 40>LVEF>50 = mildly reduced, LVEF>50 = normal) [2] was also estimated from scan-rescan data. Results The segmentation model obtained high Dice scores (> 88% for LV and RV blood pools, >83% for LV myocardium). The summary of scan-rescan biomarker differences in Fig. 1(a) highlights that biomarker repeatability is generally better in GT segmentations, with lower differences between the two scans. While performance is satisfactory, a more in-depth analysis reveals that in some pairs of scans there are inconsistencies in the segmentations, leading to fluctuations in the calculation of blood pool volumes, which affects the repeatability of the computed biomarkers, as seen in Fig. 2(b). For instance, significant changes in LVEF between pairs of scans can arise from inconsistencies in the segmentation of the LV in basal slices, as shown in Fig. 2(a). This has consequences in the classification of subjects based on LVEF, as shown in Fig. 1(b). Only two subjects belonged to the mildly reduced LVEF class, however, when comparing predicted scan and rescan LVEF, both sensitivity and specificity decrease, suggesting inconsistency in the biomarker estimation. Conclusion(s) While we have demonstrated successful AI cardiac segmentation and biomarker estimation based on conventional performance metrics, additional refinement is necessary to enhance reliability and consistency. Further work is needed to obtain a better delineation of the valve plane, which would help addressing the inconsistencies at the LV basal slice. An automated segmentation and analysis pipeline for cine CMR is vital to support overworked clinicians, however, demonstrably repeatable performance, particularly when looking at therapy response, will have a significant impact on patient care and clinical outcomes.Repeatability of GT and AI biomarkersRepeatability of GT and AI segmentations