A fine-grained robust performance diagnosis framework for run-time cloud applications

Ruyue Xin,Peng Chen,Paola Grosso,Zhiming Zhao

doi:10.1016/j.future.2024.02.014

Abstract

To maintain the required service quality of time-critical cloud applications, operators must continuously monitor their runtime status, detect potential performance anomalies, and diagnose the root causes of these anomalies effectively. However, existing performance diagnosis methods face challenges such as the need for high-quality labeled data, the low reusability and robustness of performance anomaly detection models, and the absence of real-time fine-grained root cause localization. These challenges make fixing performance issues quickly and developing effective adaptation decisions difficult. We provide a Fine-grained Robust Performance Diagnosis (FIRED) framework to tackle those challenges. The framework offers a metrics selection component to filter noise and improve detection efficiency, an anomaly detection component that assembles several well-selected base models with a deep neural network, and adopts weakly supervised learning considering fewer labels exist in reality. The framework also employs a real-time, fine-grained root cause localization component to locate dependent resource metrics of performance anomalies. Our experiments show that the framework can effectively reduce data noise and achieve the best accuracy and algorithm robustness for performance anomaly detection. In addition, the framework can accurately localize the first root causes, with an average accuracy higher than 0.7 for locating the first four root cause metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A fine-grained robust performance diagnosis framework for run-time cloud applications

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Journal: Future Generation Computer Systems	Publication Date: Feb 17, 2024
Citations: 1

Similar Papers

Clustering Performance Anomalies Based on Similarity in Processing Time Changes
Satoshi Iwata ... Kenji Kono
IPSJ Online Transactions | VOL. 5
Satoshi Iwata, et. al.Satoshi Iwata ... Kenji Kono
01 Jan 2012
IPSJ Online Transactions | VOL. 5

Multitier Web System Reliability: Identifying Causative Metrics and Analyzing Performance Anomaly Using a Regression Model.
Sundeuk Kim ... Jong Seon Kim
Sensors (Basel, Switzerland) | VOL. 23
Sundeuk Kim, et. al.Sundeuk Kim ... Jong Seon Kim
08 Feb 2023
Sensors (Basel, Switzerland) | VOL. 23

Anomaly Detection in Clouds
Kejiang Ye
-
Kejiang YeKejiang Ye
08 Apr 2017
08 Apr 2017

Reference-driven performance anomaly identification
Kai Shen ... Christopher Stewart
-
Kai Shen, et. al.Kai Shen ... Christopher Stewart
15 Jun 2009
15 Jun 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fine-grained robust performance diagnosis framework for run-time cloud applications

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems