Abstract

Sequential Decision Making (SDM) problems optimize over the sequence of actions (or, decisions) taken to minimize the underlying cumulative cost. These sequence of actions are referred to as the policy of the SDM. Often these problems comprise of additional (fixed and manipulable) parameters; and the objective is to determine the optimal policy as well as the manipulable parameters that minimizes the SDM cost. In this paper we address the class of SDM problems that are characterized by dynamic parameters; where the dynamics is pre-specified for a subset of parameters and manipulable for others. The objective is to determine the manipulable parameter dynamics as well as the time-varying policy such that the associated SDM cost gets minimized at each time instant. To this end, we develop a control-theoretic framework to design the manipulable parameter dynamics such that it tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses SDMs. More precisely, the above framework results into a smooth approximation of the SDM cost which we utilize as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, and the policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call