Distributed dynamic pricing of multiple perishable products using multi-agent reinforcement learning

Wenchuan Qiao,Min Huang,Zheming Gao,Xingwei Wang

doi:10.1016/j.eswa.2023.121252

Abstract

Revenue management (RM) is essential for a wide range of industries such as airlines, hotels, cruise lines, fashion, and seasonal retail. This paper focuses on the multi-perishable-product dynamic pricing (MPPDP) problem, a significant research field in RM, where a company sells multiple interactive and perishable products over a limited selling window without replenishment. Most studies in this field assume customer behavior, which is modeled by demand function, is known in advance. Even when considering uncertainty in customer behavior, most studies still assume the mathematical form or structural properties of the underlying demand function are known in advance. However, these assumptions are usually inconsistent with the actual market situation. Recently, Reinforcement Learning (RL), a potent technique for handling sequential decision-making problems, has been increasingly applied to solve complex dynamic pricing problems without relying on any assumption about demand functions. However, the curse of dimensionality poses a challenge for currently used centralized RL algorithms when solving the MPPDP problem due to the exponential expansion of the joint price space with the number of products. To address this issue, our paper proposes a distributed dynamic pricing framework and innovatively models the MPPDP problem as a Fully Cooperative Markov Game solved by Multi-Agent Reinforcement Learning (MARL). Additionally, we use counterfactual baselines to design appropriate agent-specific reward signals that facilitate faster learning for the agents in our established multi-agent cooperative system. Finally, two MARL-based distributed dynamic pricing algorithms, Counterfactual Q-learning, and Counterfactual DQN, are proposed for the MPPDP problem. Through the case studies on four computer-simulated markets, we show that our algorithms can alleviate the curse of dimensionality faced by centralized RL algorithms, expedite the learning process, and demonstrate satisfactory performance without relying on any assumption about demand functions. In conclusion, our work provides an effective MARL-based distributed dynamic pricing framework and algorithms for companies to efficiently price their multiple perishable products in modern highly uncertain markets.

Full Text