Automatic segmentation of temporal bone structures from patients' conventional computed tomography (CT) data plays an important role in the image-guided cochlear implant surgery. Existing convolutional neural network approaches have difficulties in segmenting such small tubular structures. We propose a light-weight three-dimensional convolutional neural network referred to as W-Net to achieve multiobjective segmentation of temporal bone structures including the cochlear labyrinth, ossicular chain and facial nerve from conventional temporal bone CT images. Data augmentation with morphological enhancement is proposed to increase the segmentation accuracy of small tubular structures. Evaluation against the state-of-the-art methods is performed. Our method achieved mean Dice similarity coefficients (DSCs) of 0.90, 0.85 and 0.77 for the cochlear labyrinth, ossicular chain and facial nerve, respectively. These results were also close to the DSCs between human expert annotators (0.91, 0.91 and 0.72). Our method achieves human-level accuracy in the segmentation of the cochlear labyrinth, ossicular chain and facial nerve.
Read full abstract