Abstract

AbstractIn control theory, to solve a finite-horizon sequential decision problem (SDP) commonly means to find a list of decision rules that result in an optimal expected total reward (or cost) when taking a given number of decision steps. SDPs are routinely solved using Bellman’s backward induction. Textbook authors (e.g. Bertsekas or Puterman) typically give more or less formal proofs to show that the backward induction algorithm is correct as solution method for deterministic and stochastic SDPs. Botta, Jansson and Ionescu propose a generic framework for finite horizon,monadicSDPs together with a monadic version of backward induction for solving such SDPs. In monadic SDPs, the monad captures a generic notion of uncertainty, while a generic measure function aggregates rewards. In the present paper, we define a notion of correctness for monadic SDPs and identify three conditions that allow us to prove a correctness result for monadic backward induction that is comparable to textbook correctness proofs for ordinary backward induction. The conditions that we impose are fairly general and can be cast in category-theoretical terms using the notion of Eilenberg–Moore algebra. They hold in familiar settings like those of deterministic or stochastic SDPs, but we also give examples in which they fail. Our results show that backward induction can safely be employed for a broader class of SDPs than usually treated in textbooks. However, they also rule out certain instances that were considered admissible in the context of Bottaet al.’s generic framework. Our development is formalised in Idris as an extension of the Bottaet al.framework and the sources are available as supplementary material.

Highlights

  • Backward induction is a method introduced by Bellman (1957) that is routinely used to solve finite-horizon sequential decision problems (SDPs)

  • Two features are crucial for finite-horizon, monadic SDPs to be solvable with the BJIframework that we study in this paper: (1) the number of decision steps has to be given explicitly as input to the backward induction and (2) at each decision step, the number of possible states has to be finite

  • 4 Correctness for monadic backward induction we formally specify the notions of correctness for monadic backward induction bi and the value function val of the BJI-framework that we will study in the remainder of this paper. We develop these notions as generic variants of the corresponding notions for stochastic SDPs

Read more

Summary

Introduction

Backward induction is a method introduced by Bellman (1957) that is routinely used to solve finite-horizon sequential decision problems (SDPs). Botta, Jansson & Ionescu (2017a) propose a generic framework for monadic finite-horizon SDPs as generalisation of the deterministic, non-deterministic and stochastic SDPs treated in control theory textbooks (Bertsekas, 1995; Puterman, 2014) This framework allows to specify such problems and to solve them with a generic version of backward induction that we will refer to as monadic backward induction. We put forward a formal specification that monadic backward induction should meet in order to be considered “correct” as solution method for monadic SDPs This specification uses an optimisation criterion that is a generic version of the expected total reward of standard control theory textbooks.. The new conditions are simple enough to be checked for non-standard instantiations of the framework This allows to broaden the applicability of backward induction to settings which are not commonly discussed in the literature and to obtain a formalised proof of correctness with considerably less effort. All source files can be type checked with Idris 1.3.2

Finite-horizon sequential decision problems
The BJI-framework
Problem specification components
Problem solution components
BJI-framework wrap-up
Extension of the BJI-framework
The problem with the BJI-value function
Correctness conditions
Impact on previously admissible measures
Correctness proofs
Deterministic case
Lemmas
Correctness of monadic backward induction
Discussion
Conclusion
General remarks concerning the Idris formalisation
Monad laws
Preservation of extensional equality
Properties of monad algebras
Measure specifications
Verification with respect to val
Optimal extension
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call