Abstract
In this paper, we introduce a novel method for the discovery of value functions for Markov decision processes (MDPs). This method, which we call value function discovery (VFD), is based on ideas from the evolutionary algorithm field. VFDs key feature is that it discovers descriptions of value functions that are algebraic in nature. This feature is unique, because the descriptions include the model parameters of the MDP. The algebraic expression of the value function discovered by VFD can be used in several scenarios, e.g., conversion to a policy (with one-step policy improvement) or control of systems with time-varying parameters. The work in this paper is a first step toward exploring potential usage scenarios of discovered value functions. We give a detailed description of VFD and illustrate its application on an example MDP. For this MDP, we let VFD discover an algebraic description of a value function that closely resembles the optimal value function. The discovered value function is then used to obtain a policy, which we compare numerically to the optimal policy of the MDP. The resulting policy shows near-optimal performance on a wide range of model parameters. Finally, we identify and discuss future application scenarios of discovered value functions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Systems, Man, and Cybernetics: Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.