Abstract
The Container-based platform of microservices has been continuously applied and developed in recent years. However, with the rapid expansion of services, how to ensure their stability and maintenance their quality has gradually become a research hotspot. Massive call chain logs and performance monitoring data of services, make it more possible to conduct in-depth research on operation and efficient maintenance. Among the expanded dataset, the tracing data can not only reflect the time-consuming of calls between services, but also express the call relationships between services. As a result, the location of the root cause can be more accurate. In this work, we trained features and obtained abnormal thresholds using semi-supervised learning, which based on the tracing and performance monitoring indicators data of a real microservice application system. Then, we detected microservice failures using dynamic sliding window, and located root causes by sorting algorithm based on call tracking. In order to evaluate the feasibility of the model, we used the public data of the Artificial intelligence for IT operations Challenge (AIOps Challenge 2020) to practice. The experimental results proved that the model has a good performance on the time-consuming curve with certain jitter frequency. Furthermore, the accuracy of anomaly detection has reached 99%, and the accuracy of root cause location has reached 98.5%.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have