Abstract

A fundamental yet notoriously difficult problem in operations management is the periodic inventory control problem under positive lead time and lost sales. More recently, there has been interest in the problem setting where the demand distribution is not known a priori and must be learned from the observations made during the decision-making process. In “Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management,” Agrawal and Jia present a reinforcement learning algorithm that uses the observed outcomes of past decisions to implicitly learn the underlying dynamics and adaptively improve the decision-making strategy over time. They show that, compared with the best base-stock policy, their algorithm achieves an optimal regret bound in terms of the time horizon and scales linearly with the lead time of the inventory ordering process. Furthermore, they demonstrate that their approach is not restricted to the inventory problem and can be applied in an almost black box manner to more general reinforcement learning problems with convex cost functions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.