Abstract

Reinforcement learning for nonstationary problems is a subject of widespread research given that most realistic problems do not exist within static environments. Approaching these problems can require significant effort in feature engineering to provide a learning algorithm with enough useful information about the state space to uncover complex system dynamics. As an alternative for problems with sufficient data describing the nonstationary environment, we propose the contextual decomposition Markov decision process (CDMDP) as a collection of stationary sub-problems intended to approximate nonstationary problem dynamics using a linear combination of value functions. We demonstrate the effectiveness of the CDMDP approach with an application in military air battle management. We use a designed computational experiment and analysis of variance to show that a complex, nonstationary learning problem can be effectively approximated with a small set of stationary sub-problems, and that the CDMDP solution significantly improves solution quality over a baseline approach without the need for additional feature engineering. If a researcher suspects that a complex and continuously varying environment can be approximated by a small number of stationary contexts, the CDMDP framework may save significant computational resources and yield decision policies that are much easier to visualize and implement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call