Abstract

Recently, the online matching problem has attracted much attention due to its wide application on real-world decision-making scenarios. In stationary environments, by adopting the stochastic user arrival model, existing methods are proposed to learn dual optimal prices and are shown to achieve a fast regret bound. However, the stochastic model is no longer a proper assumption when the environment is changing, leading to an optimistic method that may suffer poor performance. In this paper, we study the online matching problem in dynamic environments in which the dual optimal prices are allowed to vary over time. We bound the dynamic regret of online matching problem by the sum of two quantities, including a regret of online max-min problem and a dynamic regret of online convex optimization (OCO) problem. Then we propose a novel online approach named Primal-Dual Online Algorithm (PDOA) to minimize both quantities. In particular, PDOA adopts the primal-dual framework by optimizing dual prices with the online gradient descent (OGD) algorithm to eliminate the online max-min problem's regret. Moreover, it maintains a set of OGD experts and combines them via an expert-tracking algorithm, which gives a sublinear dynamic regret bound for the OCO problem. We show that PDOA achieves an O(K sqrt{T(1+P_T)}) dynamic regret where K is the number of resources, T is the number of iterations and P_T is the path-length of any potential dual price sequence that reflects the dynamic environment. Finally, experiments on real applications exhibit the superiority of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call