Abstract

Today's Internet must support applications with increasingly dynamic and heterogeneous connectivity requirements, such as video streaming and the Internet of Things. Yet current network management practices generally rely on pre-specified network configurations, which may not be able to cope with dynamic application needs. Moreover, even the best-specified policies will find it difficult to cover all possible scenarios, given applications' increasing heterogeneity and dynamic network conditions, e.g., on volatile wireless links. In this work, we instead propose a model-free learning approach to find the optimal network policies for current network flow requirements. This approach is attractive as comprehensive models do not exist for how different policy choices affect flow performance under changing network conditions. However, it can raise new challenges for online learning algorithms: policy configurations can affect the performance of multiple flows sharing the same network resources, and this performance coupling limits the scalability and optimality of existing online learning algorithms. In this work, we extend multi-armed bandit frameworks to propose new online learning algorithms for protocol selection with provably sublinear regret under certain conditions. We validate the optimality and scalability of our algorithms through data-driven simulations and testbed experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.