Abstract

Efficient configuration management plays an important role in managing large-scale clusters in public cloud services. Because a cluster may contain millions of containers, it is challenging to guarantee that configuration updates will be delivered to interested containers reliably and in time to meet the requirements of different scenarios. Existing solutions to this problem have limitations: consensus algorithms limit the scalability of the cluster and may not work for large clusters; epidemic algorithms face the challenge of long-tail latency, which means the overall response time is too long to deliver critical configuration updates effectively. To overcome the limitations of existing solutions, in this paper, we present a novel flexible approach for delivering configuration updates for large-scale clusters. This pub/sub approach uses a configurable, complete $N$-ary tree as the overlay and introduces flexible, two-phase configuration update delivery. This method of update delivery uses a portion of subscribers resources to improve its performance. Furthermore, it is fault-tolerant when it encounters node failures and network partitions. The strategies and the parameters of the overlay can be changed to meet performance and reliability requirements for different scenarios. Evaluations show that our approach significantly reduces the latency of update delivery compared to existing solutions. It also performs well in cases of node failures and network partitions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call