Abstract

Today's Internet must support applications with increasingly dynamic and heterogeneous connectivity requirements, such as video streaming and the Internet of Things. Yet current network management practices generally rely on pre-specified network configurations, which may not be able to cope with dynamic application needs. Moreover, even the best-specified policies will find it difficult to cover all possible scenarios, given applications' increasing heterogeneity and dynamic network conditions, e.g., on volatile wireless links. In this work, we instead propose a model-free learning approach to find the optimal network policies for current network flow requirements. This approach is attractive as comprehensive models do not exist for how different policy choices affect flow performance under changing network conditions. However, it can raise new challenges for online learning algorithms: policy configurations can affect the performance of multiple flows sharing the same network resources, and this performance coupling limits the scalability and optimality of existing online learning algorithms. In this work, we extend multi-armed bandit frameworks to propose new online learning algorithms for protocol selection with provably sublinear regret under certain conditions. We validate the optimality and scalability of our algorithms through data-driven simulations and testbed experiments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call