Abstract
With the development of IT infrastructures, applications and systems generate a tsunami of data that keeps growing. Traditional IT management solutions can’t keep up with volume and complexity. Artificial intelligence for IT operations (AIOps) is an extremely effective method that could simplify IT operations management and accelerate & automate problem resolution in complex modern IT environments. Alarm root cause location is an important scenario and key function of AIOps. At present, the relevant research work mainly focuses on the association mining of historical alarm data, forming alarm rules, and processing offline alarm compression. However, the practical applications require faster and more accurate root cause location of alarms, which could put forward higher requirements for its real-time performance. The online real-time alarm root cause location algorithm proposed in this paper can fully explore the relationship between alarms in the dimension of time and space, and achieve online alarm data compression through technologies such as alarm association time, alarm event division, and alarm event topology generation. Real-time accurate division of alarm events and real-time location of key alarms greatly improve the velocity and accuracy of root cause location. The algorithm has been launched on a mobile network operator's 5G network management system. With the application of the proposed algorithm, over 10,000 alarms are processed per minute, and the accuracy of the root cause of the alarm has reached 85%, which has achieved good online effects.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have