Abstract

Since artificial intelligence and machine learning have taken off over the last six years, predictive observability has progressed fast. This paper addresses the progress in machine learning technologies of leveraging observability data for anomaly detection, forecasting, and reliability prediction from 2016 to 2022. It centers on how machine learning developments have championed more proactive observation, distinguishing these from reactive observability. The review discusses unsupervised learning methods for anomaly detection, including isolation forests and autoencoders. The idea is to detect suspicious entities early when they have yet to cause harm to users. Likewise, LSTMs have shown effectiveness for forecasting critical time series of vital metrics to predict the capacity and performance issues to avoid them. Failure modeling methods like survival analysis secure the risk of failure and provide reliability improvement indicators. The transformers and the adversarial machine learning approaches are listed as the breakthroughs that lead to enhanced predictivity on noisy data. Accurate metrics quantified are 60-70% better tip-off early in the incident, a 25-50% decrease in user-affecting failures, and up to 30% shorter mean time to recovery. Companies that have already deployed probabilistic observability on a large scale, like Google, Stripe, IBM, and Alibaba, are reviewed. This summary concludes that in the past six years, applied machine learning has grown exponentially for predictive observability, giving teams proactive and preventive management capability. Advancement will occur in the future, along with the ability to harness new ML methods and deployment in real-world applications, which will see an impact on the field.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call