We aimed to compare the segmentation performance of the current prominent deep learning (DL) algorithms with ground-truth segmentations and to validate the reproducibility of the manually created 2D echocardiographic four cardiac chamber ground-truth annotation. Recently emerged DL based fully-automated chamber segmentation and function assessment methods have shown great potential for future application in aiding image acquisition, quantification, and suggestion for diagnosis. However, the performance of current DL algorithms have not previously been compared with each other. In addition, the reproducibility of ground-truth annotations which are the basis of these algorithms have not yet been fully validated. We retrospectively enrolled 500 consecutive patients who underwent transthoracic echocardiogram (TTE) from December 2019 to December 2020. Simple U-net, Res-U-net, and Dense-U-net algorithms were compared for the segmentation performances and clinical indices such as left atrial volume (LAV), left ventricular end diastolic volume (LVEDV), left ventricular end systolic volume (LVESV), LV mass, and ejection fraction (EF) were evaluated. The inter- and intra-observer variability analysis was performed by two expert sonographers for a randomly selected echocardiographic view in 100 patients (apical 2-chamber, apical 4-chamber, and parasternal short axis views). The overall performance of all DL methods was excellent [average dice similarity coefficient (DSC) 0.91 to 0.95 and average Intersection over union (IOU) 0.83 to 0.90], with the exception of LV wall area on PSAX view (average DSC of 0.83, IOU 0.72). In addition, there were no significant difference in clinical indices between ground truth and automated DL measurements. For inter- and intra-observer variability analysis, the overall intra observer reproducibility was excellent: LAV (ICC = 0.995), LVEDV (ICC = 0.996), LVESV (ICC = 0.997), LV mass (ICC = 0.991) and EF (ICC = 0.984). The inter-observer reproducibility was slightly lower as compared to intraobserver agreement: LAV (ICC = 0.976), LVEDV (ICC = 0.982), LVESV (ICC = 0.970), LV mass (ICC = 0.971), and EF (ICC = 0.899). The three current prominent DL-based fully automated methods are able to reliably perform four-chamber segmentation and quantification of clinical indices. Furthermore, we were able to validate the four cardiac chamber ground-truth annotation and demonstrate an overall excellent reproducibility, but still with some degree of inter-observer variability.
Read full abstract