Abstract Fault diagnosis of rolling bearings is significant for the safe operation of engineering equipment. Many intelligent diagnostic methods have been successfully developed. However, it is often susceptible to noisy environments. Therefore, the paper proposes a rolling bearing fault diagnosis method based on multimodal information fusion in time and time–frequency domains by combining an improved 1D-convolutional neural network (CNN) with ResNet50 wavelet improved CNN-ResNet (WCNN-RSN). The algorithm employs the Multi-Head Attention (MHA) mechanism to complementarily fuse fault features in different scales, achieving fault diagnosis by fully extracting fault features. The experimental results show that the diagnostic effect of WCNN-RSN is better than that of the comparison methods under noise interference, which proves that the proposed method possesses good anti-noise and generalization ability.