Abstract

With the advancement of technologies like 5G, cloud computing, and microservices, the complexity of network management systems and the variety of technical components have greatly increased. This rise in complexity has rendered traditional operations and maintenance methods inadequate for current monitoring and maintenance demands. Consequently, artificial intelligence for IT operations (AIOps), which harnesses AI and big data technologies, has emerged as a solution. AIOps plays a crucial role in enhancing service quality and customer satisfaction, boosting engineering productivity, and reducing operational costs. This article delves into the primary tasks involved in AIOps, such as anomaly detection, and log fault analysis and classification. A significant challenge identified in many AIOps tasks is the scarcity of fault sample data, indicating a natural alignment of these tasks with few-shot learning. Inspired by model-agnostic meta-learning (MAML), we propose a new anomaly detector, MAML-KAD, for application in various AIOps tasks. Observations confirm that meta-learning algorithms effectively enhance AIOps tasks, showcasing the wide-ranging application prospects of meta-learning algorithms in the field of AIOps. Moreover, we introduced an AIOps platform that embeds meta-learning within its diagnostic core and features streamlined log collection, caching, and alerting to automate the AIOps workflow.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call