Abstract

We consider the problem of optimally assigning $p$ sniffers to $K$ channels to monitor the transmission activities in a multichannel wireless network with switching costs. The activity of users is initially unknown to the sniffers and is to be learned along with channel assignment decisions to maximize the benefits of this assignment, resulting in the fundamental tradeoff between exploration and exploitation. Switching costs are incurred when sniffers change their channel assignments. As a result, frequent changes are undesirable. We formulate the sniffer-channel assignment with switching costs as a linear partial monitoring problem, a superclass of multiarmed bandits. As the number of arms (sniffer-channel assignments) is exponential, novel techniques are called for, to allow efficient learning. We use the linear bandit model to capture the dependency amongst the arms and develop a policy that takes advantage of this dependency. We prove that the proposed Upper Confident Bound-based (UCB) policy enjoys a logarithmic regret bound in time $t$ that depends sublinearly on the number of arms, while its total switching cost grows in the order of $O(\log\log(t))$ .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.