Diagnosing agricultural machinery faults is critical to agricultural automation, and identifying vibration signals from faulty bearings is important for agricultural machinery fault diagnosis and predictive maintenance. In recent years, data–driven methods based on deep learning have received much attention. Considering the roughness of the attention receptive fields in Vision Transformer and Swin Transformer, this paper proposes a Shift–Deformable Transformer (S–DT) network model with multi–attention fusion to achieve accurate diagnosis of composite faults. In this method, the vibration signal is first transformed into a time–frequency graph representation through continuous wavelet transform (CWT); secondly, dilated convolutional residual blocks and efficient attention for cross–spatial learning are used for low–level local feature enhancement. Then, the shift window and deformable attention are fused into S–D Attention, which has a more focused receptive field to learn global features accurately. Finally, the diagnosis result is obtained through the classifier. Experiments were conducted on self–collected datasets and public datasets. The results show that the proposed S–DT network performs excellently in all cases. With a slight decrease in the number of parameters, the validation accuracy improves by more than 2%, and the training network has a fast convergence period. This provides an effective solution for monitoring the efficient and stable operation of agricultural automation machinery and equipment.
Read full abstract