Mechanical fault diagnosis is crucial to ensure the safe operations of equipment in intelligent manufacturing systems. Recently, deep learning based fault diagnosis methods have achieved remarkable advancements with monitored data from a single sensor. However, obtaining satisfactory diagnostic results based on a single sensor is often difficult because the complementary information between different sensors is ignored. Extracting comprehensive fault features from multi-modal data is a problem that remains to be solved. To address these challenges, a time-segment-wise feature fusion Transformer (FFTR) is proposed in this paper. First, the signals from various modalities as multiple channels are normalized channel-by-channel and form a multi-modal sample. Second, a time-segment-wise feature learning network is designed to transform a multi-modal sample into several fusion features through the sequential processes of sample segmentation, segment-level feature extraction and time-aligned feature fusion. Finally, a Transformer network is employed for comprehensive multi-modal feature analysis and fault classification. In addition, a joint loss function is designed to comprehensively train the end-to-end FFTR. The comparison experiment with other baseline methods is conducted on two multi-modal datasets. The experimental results show that FFTR achieves 3.69% and 3.93% higher diagnostic accuracy than baseline on two datasets respectively and can address real-world problems effectively.
Read full abstract