Abstract

A merchant dynamically sets price in each time period when selling a product over a finite time horizon, with a given initial inventory. The merchant utilizes new external information that is observed at the beginning of each time period, while the demand function - how the external information and the price jointly impact that single-period demand distribution - is unknown. The merchant's decision, setting price dynamically, serves dual roles to learn the unknown demand function and to balance inventory, with an ultimate objective to maximize the expected cumulative revenue. We characterize and prove a full spectrum of relations between the order of optimal expected cumulative revenue achieved in three decision-making regimes: the merchant's online decision-making regime, a clairvoyant regime with complete knowledge about the demand function and, and a deterministic regime where all the uncertainties are relaxed to the expectations. In the analyses, we derive an unconstrained representation of the optimality gap for generic constrained online learning problems, which renders tractable lower and upper bounds for the expected revenue achieved by dynamic pricing algorithms between different regimes. This analytical framework also inspires the design of two dual-based history-dependent dynamic pricing algorithms for the clairvoyant regime and the online regime.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call