Abstract
Many emerging businesses and services encounter growing environments where the stochastic demand gradually increases. The increases are often not only in the expectation of the demand but also in the variance. Because of the growing environments, sequential decision making problems raise additional challenges when model parameters are unknown and need to be dynamically learned from sequentially observed data. In this work, we use a single-product dynamic pricing problem to illustrate how the non-stationary growing environment influences policy design and policy performance such as regret. We prove matching upper and lower bounds on regret and design near-optimal pricing policies. We then demonstrate how the growth rate of demand variance affects the best achievable policy performance as well as the near-optimal policy design. In the analysis, we also prove that that whether the seller knows the length of time horizon in advance or not render different optimal regret orders.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.