Accurate segmentation of traumatic brain injury (TBI) has great significance for physicians to diagnose and assess a patient’s condition. The utilization of multimodal information plays a critical role in TBI segmentation. However, most of the existing methods mainly focus on direct extraction and selection of deep semantic features, whereas in this paper, we use image fusion as an auxiliary task for feature learning based on multimodal feature extraction to achieve more sufficient fusion of multimodal features. Therefore, we design a multimodal image fusion-semantic segmentation based framework. The proposed approach mainly consists of a semantic encoder module, a semantic segmentation module and an image fusion module. The semantic encoder compresses the input image into a smaller feature space to extract semantic features. The semantic segmentation module utilizes both the detailed information extracted by the encoder and the semantic information of high-level features extracted from the semantic segmentation module to generate the segmentation results. The image fusion module fuses semantic feature information from different modalities as an auxiliary task to semantic segmentation. Furthermore, to enhance the model’s performance even further, an uncertainty-based approach is employed, which dynamically adjusts the loss weights for the image fusion task and the semantic segmentation task during the model training process. The proposed method is evaluated on a private dataset, and compared with other widely recognized methods. It demonstrates outstanding performance in both Dice score and Recall metrics.