Abstract

Markov decision processes (MDP) with discounted cost are equivalent to processes with a finite-random duration, and, hence, the discount factor models a (random) time horizon for the life of the process. We elaborate on this idea, but show that an objective function which is a linear combination of several discounted costs (each with a different discount factor) does not, in general, model processes with several time scales, but rather processes with partial information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call