The use of multi-modality non-contrast images (i.e., T1FS, T2FS and DWI) for segmenting liver tumors provides a solution by eliminating the use of contrast agents and is crucial for clinical diagnosis. However, this remains a challenging task to discover the most useful information to fuse multi-modality images for accurate segmentation due to inter-modal interference. In this paper, we propose a dual-stream multi-level fusion framework (DM-FF) to, for the first time, accurately segment liver tumors from non-contrast multi-modality images directly. Our DM-FF first designs an attention-based encoder–decoder to effectively extract multi-level feature maps corresponding to a specified representation of each modality. Then, DM-FF creates two types of fusion modules, in which a module fuses learned features to obtain a shared representation across multi-modality images to exploit commonalities and improve the performance, and a module fuses the decision evidence of segment to discover differences between modalities to prevent interference caused by modality’s conflict. By integrating these three components, DM-FF enables multi-modality non-contrast images to cooperate with each other and enables an accurate segmentation. Evaluation on 250 patients including different types of tumors from two MRI scanners, DM-FF achieves a Dice of 81.20%, and improves performance (Dice by at least 11%) when comparing the eight state-of-the-art segmentation architectures. The results indicate that our DM-FF significantly promotes the development and deployment of non-contrast liver tumor technology.