We study a family of dynamic resource allocation problems, wherein requests of different types arrive over time and are accepted or rejected. Each request type is characterized by its reward, arrival probability, and resource consumption. An upper bound for the collected reward is given by a linear optimization problem with a random right-hand side. This type of problem, known as packing linear program (LP), is ubiquitous in resource allocation. We provide a detailed characterization of the parametric structure of this packing LP. Relying on this geometric understanding, we revisit and expand on BudgetRatio algorithms that achieve constant regret by resolving this same packing LP in each period and accepting requests scored as sufficiently valuable. We illustrate the benefits of the geometric view in proving that (i) BudgetRatio achieves constant regret relative to the offline (full information) upper bound in the presence of inventory that is (slowly) restocked, and (ii) within explicitly identifiable bounds, the algorithm’s regret is robust to misspecification of the model parameters. This gives bounds for the bandits version of the problem in which the parameters have to be learned. (iii) The algorithm has an equivalent formulation as a generalized bid-price algorithm in which the bid prices can be adaptively and efficiently computed. Our analysis focuses on the evolution of the remaining inventory—in turn of the LP that drives BudgetRatio—as a stochastic process. We prove that it is attracted to sticky regions of the state space in which the online algorithm takes actions consistent with the optimal basis of the offline upper bound, a basis that is revealed only in hindsight at the horizon’s end. Funding: This work was supported by the U.S. Department of Defense [Grant W911NF-20-C-0008].
Read full abstract