Aquarius—Enable Fast, Scalable, Data-Driven Service Management in the Cloud

Zhiyuan Yao,Yoann Desmouceaux,Thomas Clausen,Mark Townsley,Juan-Antonio Cordero-Fuertes

doi:10.1109/tnsm.2022.3197130

Zhiyuan Yao, Yoann Desmouceaux + Show 3 more

Open Access

PDF Available

https://doi.org/10.1109/tnsm.2022.3197130

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information - and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information—without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open- and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer—and demonstrates the use of three different ML paradigms—unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.

Full Text