Segmentation of the cochlea in temporal bone computed tomography (CT) is the basis for image-guided otologic surgery. Manual segmentation is time-consuming and laborious. To assess the utility of deep learning analysis in automatic segmentation of the cochleae in temporal bone CT to differentiate abnormal images from normal images. Three models (3D U-Net, UNETR, and SegResNet) were trained to segment the cochlea on two CT datasets (two CT types: GE 64 and GE 256). One dataset included 77 normal samples, and the other included 154 samples (77 normal and 77 abnormal). A total of 20 samples that contained normal and abnormal cochleae in three CT types (GE 64, GE 256, and SE-DS) were tested on the three models. The Dice similarity coefficient (DSC) and Hausdorff distance (HD) were used to assess the models. The segmentation performances of the three models improved after adding abnormal cochlear images for training. SegResNet achieved the best performance. The average DSC on the test set was 0.94, and the HD was 0.16 mm; the performance was higher than those obtained by the 3D U-Net and UNETR models. The DSCs obtained using the GE 256 CT, SE-DS CT, and GE 64 CT models were 0.95, 0.94, and 0.93, respectively, and the HDs were 0.15, 0.18, and 0.12 mm, respectively. The SegResNet model is feasible and accurate for automated cochlear segmentation of temporal bone CT images.
Read full abstract