Dynamic Application Call Graph Formation and Service Identification in Cloud Data Centers

Mona Elsaadawy,Jifeng Wang,Xinchen Hou,Mohamed Younis,Bettina Kemme

doi:10.1109/tnsm.2022.3201095

Abstract

Monitoring distributed service-based cloud applications and understanding the interactions among the different components are crucial to diagnose and resolve performance issues. However, many existing cloud monitoring systems require sophisticated application and/or platform instrumentation and cannot be deployed on-demand, or they provide only partial functionality. In most cases, monitoring comes with a significant overhead. To overcome these shortcomings, this paper presents DyMonD, a holistic framework that Dynamically Monitors an application, Discovers the service components, and visualizes them together with some performance metrics such as throughput in the form of a call graph. DyMonD is completely decoupled from the internals of the applications and the services themselves, as it deploys monitoring agents transparently at the software switches within the network, extracts all necessary information from the messages exchanged by the components, and performs service identification by using deep learning on the network flows. Our evaluation shows that DyMonD has significantly less overhead than existing tools, reducing the monitoring-induced impact on response time by up to 89%, and reducing resource consumption such as CPU and memory usage by up to 75%. Furthermore, DyMond’s deep learning module identifies services up to 11% more accurately than competing models.

Full Text