Bearing fault diagnosis holds paramount importance in guaranteeing the smooth operation of mechanical systems. However, single-channel vibration sensor data poses limitations in providing a comprehensive representation of bearing fault characteristics, and the fault diagnosis model extracts fault features with insufficient representativeness, resulting in low reliability and poor generalization capability of the model in the face of variable operating conditions. Therefore, a novel intelligent fault diagnosis model of bearings driven by double-level data fusion (DLDF) is developed in this paper. Initially, the fusion weights of each channel's data are obtained using the information entropy method. Then, a novel multi-modal image fusion strategy is proposed to integrate the time-frequency representations of vibration data from channels X, Y, and Z at a specific measurement point. Finally, the shallow and deep feature components extracted from the CNN model and the feature components obtained through different mapping methods are fused to extract more representative fault features, achieving a reliable diagnosis of bearing faults under time-varying speed conditions. The efficacy and reliability of the proposed approach are demonstrated through the fault diagnosis outcomes of two sets of bearing experimental data under time-varying speed conditions.