Abstract

Root-cause analysis for integrated systems has become increasingly challenging due to their growing complexity. To tackle these challenges, machine learning (ML) has been applied to enhance root-cause analysis. Nonetheless, ML-based root-cause analysis usually requires abundant training data with root causes labeled by human experts, which are difficult or even impossible to obtain. To overcome this drawback, a semi-supervised co-training method is proposed for root-cause-analysis in this paper, which only requires a small portion of labeled data. First, a random forest is trained with labeled data. Next, we propose a co-training technique to learn from unlabeled data with semi-supervised learning, which pre-labels a subset of these data automatically and then retrains each decision tree in the random forest. In addition, a robust framework is proposed to avoid over-fitting. We further apply initialization by clustering and feature selection to improve the diagnostic performance. With two case studies from industry, the proposed approach shows superior performance against other state-of-the-art methods by saving up to 67% of labeling efforts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call