A Framework for Detecting System Performance Anomalies Using Tracing Data Analysis.

Iman Kohyarnejadfard,Daniel Aloise,Michel R Dagenais,Mahsa Shakeri

doi:10.3390/e23081011

Abstract

Advances in technology and computing power have led to the emergence of complex and large-scale software architectures in recent years. However, they are prone to performance anomalies due to various reasons, including software bugs, hardware failures, and resource contentions. Performance metrics represent the average load on the system and do not help discover the cause of the problem if abnormal behavior occurs during software execution. Consequently, system experts have to examine a massive amount of low-level tracing data to determine the cause of a performance issue. In this work, we propose an anomaly detection framework that reduces troubleshooting time, besides guiding developers to discover performance problems by highlighting anomalous parts in trace data. Our framework works by collecting streams of system calls during the execution of a process using the Linux Trace Toolkit Next Generation(LTTng), sending them to a machine learning module that reveals anomalous subsequences of system calls based on their execution times and frequency. Extensive experiments on real datasets from two different applications (e.g., MySQL and Chrome), for varying scenarios in terms of available labeled data, demonstrate the effectiveness of our approach to distinguish normal sequences from abnormal ones.

Highlights

In recent years, computing infrastructure has significantly evolved, whereas complex systems have facilitated many complicated and large-scale tasks
We evaluate the performance of the proposed anomaly detection approaches with respect to two different extracted feature spaces, one based on the duration and another based on the frequency of system calls
We deploy MySQL and Chrome processes on virtual machines (VMs) and extract system calls from tracing the Linux kernel events to construct the feature vectors

Summary

Introduction

In recent years, computing infrastructure has significantly evolved, whereas complex systems have facilitated many complicated and large-scale tasks. A simple operation can involve multiple parallel cores, being served in a few seconds or milliseconds. These improvements have increased the expectation level of the users, so that any performance fluctuations or increased latency may lead to user dissatisfaction and financial loss. Different reasons such as software bugs, misconfigurations, network disconnection, hardware faults, aging phenomena of the systems, or even extreme load injected by other applications into the system, may degrade the performance of a particular service or application. Any delay in detecting performance problems and troubleshooting can significantly increase the cost to fix them

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy	Publication Date: Aug 3, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Framework for Detecting System Performance Anomalies Using Tracing Data Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy

Lead the way for us

Similar Papers

Software Timing Analysis for Complex Hardware with Survivability and Risk Analysis
Sergi Vilardell ... Joan Del Castillo
-
Sergi Vilardell, et. al.Sergi Vilardell ... Joan Del Castillo
01 Nov 2019
01 Nov 2019

Parallel program performance prediction using deterministic task graph analysis
Vikram S Adve ... Mary K Vernon
ACM Transactions on Computer Systems | VOL. 22
Vikram S Adve, et. al.Vikram S Adve ... Mary K Vernon
01 Feb 2004
ACM Transactions on Computer Systems | VOL. 22

A Lightweight Model for Right-Sizing Master-Worker Applications
Nathaniel Kremer-Herman ... Benjamin Tovar
-
Nathaniel Kremer-Herman, et. al.Nathaniel Kremer-Herman ... Benjamin Tovar
01 Nov 2018
01 Nov 2018

Optimal component selection of COTS based software system under recovery block scheme incorporating execution time
P C Jha ... P K Kapur
International Journal of Systems Assurance Engineering and Management | VOL. 1
P C Jha, et. al.P C Jha ... P K Kapur
01 Mar 2010
International Journal of Systems Assurance Engineering and Management | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Framework for Detecting System Performance Anomalies Using Tracing Data Analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy