Segmentation of mitral valve is not only important for clinical diagnosis, but also has far-reaching impact on prevention and prognosis of the disease by experts and doctors.In this paper, the multi-channel cross fusion transformer based U-Net network model (MCCT-UNet) is proposed according to the classical U-Net architecture. First, the jump connection part of MCCT-UNet is designed by using a multi-channel cross-fusion based attention mechanism module (MCCT) instead of the original jump connection, and this module fuses the feature maps from different scales in different stages of the encoder. Second, the optimization of the feature fusion method is proposed in the decoding stage by designing the cross-compression excitation sub-module (C-SENet) to replace the simple feature splicing, and the C-SENet is used to bridge the inconsistency of the semantic hierarchy by effectively combining the deeper information in the encoding stage with the shallower information. This two modules can establish a close connection between the encoder and decoder by exploring multi-scale global contextual information to solve the semantic divide problem, thus it significantly improves the segmentation performance of the network. The experimental results show that the improvement is effective, and the MCCT-UNet model outperforms the other 9 network models. Specifically, the MCCT-UNet achieved a Dice coefficient of 0.8734, an IoU of 0.7854, and an accuracy of 0.9977, demonstrating significant improvements over the compared models.