Failure detection is an important part of failure management, and network operators encounter serious consequences when operating under failure conditions. Machine learning (ML) is widely applied in the failure management of optical networks, where neural networks (NNs) have particularly attracted considerable attention and become the most extensively applied algorithm among all MLs. However, the black-box nature of NN makes it difficult to interpret or analyze why and how NNs work during execution. In this paper, we propose a cause-aware failure detection scheme for optical transport network (OTN) boards, adopting the interpretable extreme gradient boosting (XGBoost) algorithm. According to the feature importance ranking by XGBoost, the high-relevance features with the equipment failure are found. Then, SHapley Additive exPlanations (SHAP) is applied to solve the inconsistency of feature attribution under three common global feature importance measurement parameters of XGBoost, and can obtain a consistent feature attribution by calculating the contribution (SHAP value) of each input feature to detection result of XGBoost. Based on the feature importance ranking of SHAP values, the features most related to two types of OTN board failures are confirmed, enabling the identification of failure causes. Moreover, we evaluate the failure detection performance for two types of OTN boards, in which the practical data are balanced and unbalanced respectively. Experimental results show that the F1 score of the two types of OTN boards based on the proposed scheme is higher than 98%, and the most relevant features of the two types of board failures are confirmed based on SHAP value, which are the average and maximum values of the environment temperature, respectively.
Read full abstract