Abstract

A new methodology is presented to solve an important model of dynamic decision-making with a continuous unknown parameter (or state). The methodology centers on the concepts of “continuation-value function” (which gives the expected value-to-go from every possible state under a feasible policy) and “efficient frontier” of such functions in each period. When the model primitives can be described through a family of basis functions, e.g. polynomials, a continuation-value function retains that property and can be fully represented by a basis weight vector. The efficient frontiers of the weight vectors can be constructed through backward induction, which leads to an essential reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology are developed to tackle larger problems. The methodology is also extended to the multi-dimensional (multi-parameter) setting, which features the important problem of contextual multi-armed bandits with linear expected rewards. We demonstrate that our approximation algorithm for that problem has a clear edge over three benchmark algorithms in the challenging learning environment with many actions and relatively short horizons.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.