Deep neural networks (DNNs) excel at industrial fault diagnosis. Their performance heavily relies on the quality of human-annotated labels. Due to perception limitations of annotators, industrial time series samples (such as vibration and voltage signals) are frequently mislabeled in several conditions, such as samples with frequency domain feature differences and samples on class borders. Hence, an annotated industrial dataset will inevitably contain noisy labels at a certain level, leading to over-fitting and poor generalization of DNNs. In this work, we introduce an Industrial Noisy Label Semi-Supervised Learning (INL-SSL) fault diagnosis approach, addressing the problem that a certain number of samples in an industrial dataset are mislabeled. The proposed INL-SSL architecture simultaneously trains two DNNs, which cross-train on each other to filter noisy label errors. In particular, a fitted Gaussian mixture model divides time series samples of each DNN flow into an unlabeled set with samples likely to be noisy and a labeled set with samples likely to be clean. Given the labeled and unlabeled data, we proposed a time series MixMatch semi-supervised learning strategy to train the diagnostic model. Ablation study verifies the benefit of the proposed time series augmentation techniques for semi-supervised training. Extensive experiments on a benchmark industrial dataset of rolling element bearings (REB) reveal that the INL-SSL outperforms state-of-the-art approaches. On another self-collected REB dataset, the proposed approach also exceeds other comparison methods under noise ratios from 20% to 90%, validating the model's generalizability.
Read full abstract