Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer’s valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer’s true valuation, but instead needs to learn the valuation by leveraging contextual information and historic binary purchase feedback. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the linear valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, in which a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sublinear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto loan data set. Funding: Y. Liu and W.W. Sun acknowledge support from the National Science Foundation Division of Social and Economic Sciences [Grant NSF-SES 2217440]. Supplemental Material: The supplementary material is available at https://doi.org/10.1287/moor.2023.1369 .
Read full abstract