Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

Boxiao Chen,Yuan Zhou,Yining Wang

doi:10.2139/ssrn.3750413

Abstract

We study the fundamental model in joint pricing and inventory replenishment control under the learning-while-doing framework, with T consecutive review periods and the firm not knowing the demand curve a priori. At the beginning of each period, the retailer makes both a price decision and an inventory order-up-to level decision, and collects revenues from consumers' realized demands while suffering costs from either holding unsold inventory items, or lost sales from unsatisfied customer demands. We make the following contributions to this fundamental problem as follows: 1. We propose a novel inversion method based on empirical measures to consistently estimate the of the instantaneous reward functions at two prices, directly tackling the fundamental challenge brought by censored demands, without raising the order-up-to levels to unnaturally high levels to collect more demand information. Based on this technical innovation, we design bisection and trisection search methods that attain an O(T^{1/2}) regret, assuming the reward function is concave and only twice continuously differentiable. 2. In the more general case of non-concave reward functions, we design an active tournament elimination method that attains O(T^{3/5}) regret, based also on the technical innovation of consistent estimates of reward differences at two prices. 3. We complement the O(T^{3/5}) regret upper bound with a matching \Omega(T^{3/5}) regret lower bound. The lower bound is established by a novel information-theoretical argument based on generalized squared Hellinger distance, which is significantly different from conventional arguments that are based on Kullback-Leibler divergence. This lower bound shows that no learning-while-doing algorithm could achieve O(T^{1/2}) regret without assuming the reward function is concave, even if the sales revenue as a function of demand rate or price is concave. Both the upper bound technique based on the difference estimator and the lower bound technique based on generalized Hellinger distance are new in the literature, and can be potentially applied to solve other inventory or censored demand type problems that involve learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Journal: SSRN Electronic Journal	Publication Date: Jan 1, 2020
Citations: 2

Similar Papers

Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands
Boxiao Chen ... Yining Wang
Management Science | VOL. 70
Boxiao Chen, et. al.Boxiao Chen ... Yining Wang
31 Aug 2023
Management Science | VOL. 70

Deterministic price–inventory management for substitutable products
H Ahmet Kuyumcu ... Ioana Popescu
Journal of Revenue and Pricing Management | VOL. 4
H Ahmet Kuyumcu, et. al.H Ahmet Kuyumcu ... Ioana Popescu
01 Jan 2006
Journal of Revenue and Pricing Management | VOL. 4

Joint Pricing and Inventory Control under Reference Price Effects
Lisa Gimpl-Heersink
-
Lisa Gimpl-HeersinkLisa Gimpl-Heersink
01 Jan 2009
01 Jan 2009

Asymptotic Optimality of Constant-Order Policies in Joint Pricing and Inventory Control Models
Xin Chen ... Linwei Xin
SSRN Electronic Journal | VOL. -
Xin Chen, et. al.Xin Chen ... Linwei Xin
09 May 2019
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal