Abstract
Dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this paper, we propose a Deep Reinforcement Learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a Dynamic Pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make non-experts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including MDP modeling, algorithm selection, and hyper-parameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyper-parameters are tuned using a novel hyper-parameters optimization method that integrates Bayesian Optimization and the selection operator of the Genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the RTB environment that makes exploration possible for the RL agent.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have