Alarm reduction and root cause inference based on association mining in communication network

Min Li,Pengfei Chen,Mengyuan Yang

doi:10.3389/fcomp.2023.1211739

Min Li, Pengfei Chen + Show 1 more

Open Access

PDF Available

https://doi.org/10.3389/fcomp.2023.1211739

Copy DOI

Export

Save

Cite

Journal: Frontiers in Computer Science	Publication Date: Jul 6, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: Sun Yat-sen University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

With the growing demand for data computation and communication, the size and complexity of communication networks have grown significantly. However, due to hardware and software problems, in a large-scale communication network (e.g., telecommunication network), the daily alarm events are massive, e.g., millions of alarms occur in a serious failure, which contains crucial information such as the time, content, and device of exceptions. With the expansion of the communication network, the number of components and their interactions become more complex, leading to numerous alarm events and complex alarm propagation. Moreover, these alarm events are redundant and consume much effort to resolve. To reduce alarms and pinpoint root causes from them, we propose a data-driven and unsupervised alarm analysis framework, which can effectively compress massive alarm events and improve the efficiency of root cause localization. In our framework, an offline learning procedure obtains results of association reduction based on a period of historical alarms. Then, an online analysis procedure matches and compresses real-time alarms and generates root cause groups. The evaluation is based on real communication network alarms from telecom operators, and the results show that our method can associate and reduce communication network alarms with an accuracy of more than 91%, reducing more than 62% of redundant alarms. In addition, we validate it on fault data coming from a microservices system, and it achieves an accuracy of 95% in root cause location. Compared with existing methods, the proposed method is more suitable for operation and maintenance analysis in communication networks.

Full Text