Abstract
Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI, and only employ modality-specific internal normalization layers which compute respective statistics. To effectively train such a highly compact model, we introduce a novel loss term inspired by knowledge distillation, by explicitly constraining the KL-divergence of our derived prediction distributions between modalities. We have extensively validated our approach on two multi-class segmentation problems: i) cardiac structure segmentation, and ii) abdominal organ segmentation. Different network settings, i.e., 2D dilated network and 3D U-net, are utilized to investigate our method's general efficacy. Experimental results on both tasks demonstrate that our novel multi-modal learning scheme consistently outperforms single-modal training and previous multi-modal approaches.
Highlights
A NATOMICAL structures are imaged with a variety of modalities depending on the clinical indication
We extensively evaluate our method on two Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) multi-class segmentation tasks, including cardiac segmentation with a 2D dilated convolutional neural network (CNN) and abdominal multiorgan segmentation with a 3D U-Net
The two key aspects are: 1) separating internal feature normalizations for each modality, given the very different statistical distributions of CT and MRI; 2) knowledge distillation from pre-softmax activations, in order to leverage information shared across modalities to guide the multi-modal learning
Summary
A NATOMICAL structures are imaged with a variety of modalities depending on the clinical indication. Early fusion means concatenating multi-modal images as different channels at the input layer of a network. This strategy has demonstrated effectiveness on segmenting brain tissue [3]–[5] and brain lesions [6]–[8] in multiple sequences of MRI. More complex multi-modal CNNs have been designed, by leveraging dense connections [12], inception modules [13] or multi-scale feature fusion [14] These more complicated models still follow the idea of combining modality-specific and shared layers. Our paper proposes a novel compact model for unpaired CT and MRI multi-modal segmentation, by explicitly addressing distribution shift and distilling cross-modality knowledge. Code for our proposed approach is publicly available at https://github.com/carrenD/ummkd
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.