Abstract Automated tumor segmentation plays a critical role in facilitating the diagnosis and assessment of disease progression. Within the realm of tumor segmentation, Contrast-Enhanced (CE) scans are an effective imaging tool that allows for more intuitive observation of tumor characteristics and generally provide better segmentation results compared with Non-Contrast-Enhanced (NCE) scans alone. However, CE images are not available in most cases due to the time-consuming and costly need for contrast and repeat scans. To solve this issue, this paper proposes a Collaborative framework for the Synthesis and Segmentation of missing CE images in medical imaging with error-prediction consistency (CSS-Net). CSS-Net simultaneously addresses synthesis and segmentation tasks, generating both the synthesized CE images and coarse segmentation results. Subsequently, a multi-layer adaptive feature fusion strategy is utilized to effectively leverage the correlation between these tasks, resulting in refined segmentation results. Additionally, the proposed method incorporates a multi-layer feature fusion block, which adaptively selects features pertinent to segmentation. Furthermore, error-prediction consistency is also introduced between coarse and refined segmentation for regularization, leading to high-performance segmentation results. What’s more, we constructed a multimodal esophageal tumor segmentation dataset with 902 patients and validated it on this dataset and two publicly available multimodal brain tumor datasets. The results indicate that our method achieved Dice scores of 89.04% in esophageal tumor segmentation, 77.01% in whole glioma segmentation, and 91.14% in Vestibular Schwannoma segmentation. This performance surpasses that of segmentation using only available modalities and other image synthesis-based segmentation methods, demonstrating the superior robustness of CSS-Net.