Magnetic resonance (MR) imaging is widely used in the clinical field due to its non-invasiveness, but the long scanning time is still a bottleneck for its popularization. Using the complementary information between multi-modal imaging to accelerate imaging provides a novel and effective MR fast imaging solution. However, previous technologies mostly use simple fusion methods and fail to fully utilize their potential sharable knowledge. In this study, we introduced a novel multi-hierarchical complementary feature interaction network (MHCFIN) to realize joint reconstruction of multi-modal MR images with undersampled data and thus accelerate multi-modal imaging. Firstly, multiple attention mechanisms are integrated with a dual-branch encoder–decoder network to represent shared features and complementary features of different modalities. In the decoding stage, the multi-modal feature interaction module (MMFIM) acts as a bridge between the two branches, realizing complementary knowledge transfer between different modalities through cross-level fusion. The single-modal feature fusion module (SMFFM) carries out multi-scale feature representation and optimization of the single modality, preserving better anatomical details. Extensive experiments are conducted under different sampling patterns and acceleration factors. The results show that this proposed method achieves obvious improvement compared with existing state-of-the-art reconstruction methods in both visual quality and quantity.