Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek,Rad Niazadeh,Ron Lavi,Yangguang Shi

doi:10.1287/moor.2023.1375

Abstract

An online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. Because existing online learning techniques do not yield vanishing regret for this problem, we develop a novel online learning framework over deterministic Markov decision processes with dynamic state transition and reward functions. Following that, we prove, based on a reduction to the well-studied problem of online learning with switching costs, that if the Markov decision process admits a chasing oracle (i.e., an oracle that simulates any given policy from any initial state with bounded loss), then the online learning problem can be solved with vanishing regret. Our results for the DRACC problem and its applications are then obtained by devising (randomized and deterministic) chasing oracles that exploit the particular structure of these problems. Funding: The work of Y. Emek was supported in part by the Israel Science Foundation [Grant 1016/17]. The work of R. Lavi was partially supported by the Israel Science Foundation [Grant 2560/17] and the National Natural Science Foundation of China [Grant 2560/17]. The work of R. Niazadeh was supported by the University of Chicago Booth School of Business. The work of Y. Shi was partially supported at the Technion by the Council for Higher Education [fellowship], and at Shandong University by the Science Fund Program of Shandong Province for Distinguished Oversea Young Scholars [Grant 2023HWYQ-006].

Full Text