Abstract
Cloud computing technology enables uniform access to shared pools of configurable system resources and higher-level services, rapidly provisioned with minimal management effort. Cloud computing relies on sharing the resources to achieve coherence and economies of scale, through virtualizion. Cloud network, in particular, is virtualized through multiple logical constructs and SW layers, making cloud connectivity complex to configure, debug, and visualize. In this work, we show how to detect cloud network operational issues through monitoring and analytics, using and enhancing open source network analyzer, Skydive [2]. In particular, we focus on Noisy Neighbor Effect, a situation in which a common resource is monopolized by a noisy tenant, resulting in performance degradation experienced by other tenants. Skydive is an open-source network topology and protocol analyzer, capable of discovering and visualizing cloud network topology across its multiple layers, as well as capturing network traffic at programmable granularity, injecting network traffic, and more. Typical Skydive setup consists of multiple Skydive agents installed on various network components and one or more Skydive analyzers deployed on any compute resource in the cloud. Skydive agents discover and report the information to a Skydive analyzer, that stores it over time so it can be consumed via Web UI, command line tools, and REST API, for visualization, exploration, and analytics. In our work we used Skydive to investigate and detect the Noisy Neighbor Effect in Kubernetes (k8s) network. Our setup consisted of a commercial cloud platform, IBM Cloud Private (ICP) [1], running an HTTP server and two HTTP clients constantly sending requests to the server, all 3 are containerized Python applications as shown in Figure 1. We have installed Skydive agents on all the k8s worker nodes. To achieve our goal of detecting anomalous client behavior and creating a visual indication of such anomaly in Skydive UI, we have enhanced Skydive capabilities and contributed our enhancements back to the project by extending the Python REST client library to support traffic injections, and fixing existing bugs in the Skydive system. We used those enhancements to measure Round Trip Time (RTT) between nodes in the cloud network, detect anomalies in RTT measurements and indicate them in Skydive UI, such as the green indication in Figure 1. In this work, we have made the first step towards automatic detection of Noisy Neighbor with Skydive, using simple threshold based approach, in an experimental setup. This work can be extended in a multiple ways - support more generic and realistic multi-tenant setup; employ deeper analyses, e.g. ML and DL, also on historical data; explore additional anomalous cases, beyond the Noisy Neighbor Effect.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.