Abstract

The Container-based platform of microservices has been continuously applied and developed in recent years. However, with the rapid expansion of services, how to ensure their stability and maintenance their quality has gradually become a research hotspot. Massive call chain logs and performance monitoring data of services, make it more possible to conduct in-depth research on operation and efficient maintenance. Among the expanded dataset, the tracing data can not only reflect the time-consuming of calls between services, but also express the call relationships between services. As a result, the location of the root cause can be more accurate. In this work, we trained features and obtained abnormal thresholds using semi-supervised learning, which based on the tracing and performance monitoring indicators data of a real microservice application system. Then, we detected microservice failures using dynamic sliding window, and located root causes by sorting algorithm based on call tracking. In order to evaluate the feasibility of the model, we used the public data of the Artificial intelligence for IT operations Challenge (AIOps Challenge 2020) to practice. The experimental results proved that the model has a good performance on the time-consuming curve with certain jitter frequency. Furthermore, the accuracy of anomaly detection has reached 99%, and the accuracy of root cause location has reached 98.5%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.