Abstract In the industrial field, malfunction of rotating machinery, especially bearings, can cause significant economic losses to enterprises. Addressing the limitations of traditional fault diagnosis methods, such as poor generalization performance and low noise resistance, this paper introduces a fault diagnosis model that parallels the cross convolutional transformer and ResNet18 (CCTAR). The proposed CCTAR utilizes two feature extraction channels, aimed at balancing the extraction of local and global features, and the specially designed convolutional cross-decoding layer has excellent noise resistance, surpassing traditional multi-layer Transformer encoding layers with a single-layer structure. CCTAR achieves commendable recognition accuracy across multiple datasets and maintains high accuracy in noisy environments. Furthermore, transfer learning experiments have demonstrated the proposed model’s capability to achieve superior fault diagnosis performance across different working conditions with a limited number of samples, highlighting its practical significance.