We consider a price-based revenue management problem with finite reusable resources over a finite time horizon. Stochastically arrived customers request an exponentially distributed service time and may balk and renege given insufficient resource. The resource unit, upon completion of serving one customer, can be released to serve the next customer immediately. The arrival, service, balking, and reneging rates all depend on the price being offered. In this paper, we assume that the firm does not know the mappings between these rates and prices, and thus it makes adaptive pricing decisions in each period based only on past sales to maximize the cumulative revenue. We propose two new multi-armed bandit (MAB) based learning algorithms, termed Batch Upper Confidence Bound (BUCB) algorithm and Batch Thompson Sampling (BTS) algorithm, for finding near-optimal pricing policies. Compared with prior pricing and MAB literature, the salient difficulties of this problem lie in (i) the unknown rate-and-price mapping information, (ii) the dynamic nature of reusable resources being committed over time, (iii) the transient behavior of the service system when the price changes, and (iv) unbounded and heavy-tailed distributions of observed random variables. Our proposed algorithms contain a Warm-up Phase to eliminate the heavy-tail effects and a Learning Phase to identify the optimal price. Our algorithms separate the Learning Phase into successive operational batches and select a price from a prescribed set in each batch using past sales collected in previous batches. The performance measure is cumulative regret, which is the difference between the revenue attained by our approach and by a clairvoyant optimal pricing policy under full distributional information. We prove that the cumulative regret is $O(\sqrt{PT\log (T)})$, where $T$ is the total number of time periods and $P$ is the cardinality of the feasible price set, and the result matches the lower bound up to a logarithmic factor. As an intermediate step, we also develop a coupling analysis for analyzing the time for a queue to reach the steady state from an empty state or from a steady state under another set of system parameters. Our numerical experiments demonstrate and confirm the efficacy of the proposed BUCB and BTS algorithms.
Read full abstract