Abstract

Modern computing systems especially cloud-based and cloud-centric systems always consist of a mass of components running in large distributed environments with complicated interactions. They are vulnerable to performance problems due to the highly dynamic runtime environment changes (e.g., overload and resource contention) or software bugs (e.g., memory leak). Unfortunately, it is notoriously difficult to diagnose the root causes of these performance problems in a fine granularity due to complicated interactions and a large cardinality of potential cause set. In this paper, we build an automated, black-box and end-to-end cause inference system named CauseInfer to pinpoint the root causes or at least provide some hints. CauseInfer can automatically map a distributed system to a two-layer hierarchical causality graph and infer the root causes along the causal paths in the causality graph. CauseInfer models the fault propagation paths in an explicit way and works without instrumentation to the running production system, which makes CauseInfer more effective and practical than previous approaches. The experimental evaluations in two benchmark systems show that CauseInfer can identify the root causes in a high accuracy. Compared to several state-of-the-art approaches, CauseInfer can achieve over 10 percent improvement. Moreover, CauseInfer is lightweight and flexible enough to readily scale out in large distributed systems. With CauseInfer, the mean time to recovery (MTTR) of the cloud systems can be significantly reduced.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.