Abstract

Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI, and only employ modality-specific internal normalization layers which compute respective statistics. To effectively train such a highly compact model, we introduce a novel loss term inspired by knowledge distillation, by explicitly constraining the KL-divergence of our derived prediction distributions between modalities. We have extensively validated our approach on two multi-class segmentation problems: i) cardiac structure segmentation, and ii) abdominal organ segmentation. Different network settings, i.e., 2D dilated network and 3D U-net, are utilized to investigate our method's general efficacy. Experimental results on both tasks demonstrate that our novel multi-modal learning scheme consistently outperforms single-modal training and previous multi-modal approaches.

Highlights

  • A NATOMICAL structures are imaged with a variety of modalities depending on the clinical indication

  • We extensively evaluate our method on two Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) multi-class segmentation tasks, including cardiac segmentation with a 2D dilated convolutional neural network (CNN) and abdominal multiorgan segmentation with a 3D U-Net

  • The two key aspects are: 1) separating internal feature normalizations for each modality, given the very different statistical distributions of CT and MRI; 2) knowledge distillation from pre-softmax activations, in order to leverage information shared across modalities to guide the multi-modal learning

Read more

Summary

INTRODUCTION

A NATOMICAL structures are imaged with a variety of modalities depending on the clinical indication. Early fusion means concatenating multi-modal images as different channels at the input layer of a network. This strategy has demonstrated effectiveness on segmenting brain tissue [3]–[5] and brain lesions [6]–[8] in multiple sequences of MRI. More complex multi-modal CNNs have been designed, by leveraging dense connections [12], inception modules [13] or multi-scale feature fusion [14] These more complicated models still follow the idea of combining modality-specific and shared layers. Our paper proposes a novel compact model for unpaired CT and MRI multi-modal segmentation, by explicitly addressing distribution shift and distilling cross-modality knowledge. Code for our proposed approach is publicly available at https://github.com/carrenD/ummkd

RELATED WORK
Independent normalization of CT and MRI
Knowledge distillation
METHODS
Separate internal feature normalization
Knowledge distillation loss
Overall loss function and training procedure
EXPERIMENTS
Datasets and networks
Experimental settings
Segmentation results and comparison with state-of-the-arts
Methods
Analytical ablation studies
C T confusion matrix
DISCUSSION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.