Multimodal magnetic resonance imaging (MRI) contains complementary information in anatomical and functional images that help the accurate diagnosis and treatment evaluation of lung cancers. However, effectively exploiting the complementary information in chest MRI images remains challenging due to the lack of rigorous registration. In this paper, a novel method is proposed that can effectively exploit the complementary information in weakly paired images for accurate tumor segmentation, namely coco-attention mechanism. Coco-attention module consists of two parts: the multi-modal co-attention (MultiCo-attn) and the multi-level coordinate attention (MultiCord-attn). The former aims to obtain tumor-aware deep features for accurate tumor localization, and the latter aims to highlight tumor area for more precise segmentation. Specifically, the MultiCo-attn extracts complementary information from multimodal high-dimensional semantic features using a bidirectional algorithm to generate attention maps focused on tumor region, and then uses the attention maps to enhance the feature representations. The MultiCord-attn leverages multi-level feature information to highlight tumor regions by adjusting the weight of each point in the feature. We evaluate the proposed method on lung tumor segmentation with a clinical dataset of 90 chest MRI scans of non-small cell lung cancer (NSCLC). The results show that the proposed method is effective for tumor segmentation in weakly paired images and achieves significant improvement (p < 0.005) over several commonly used multimodal segmentation methods. Furthermore, the ablation experiment results confirm the effectiveness and interpretability of the proposed coco-attention module.