Abstract

We present a novel simulation-based algorithm, as an extension of the well-known policy iteration algorithm, by combining multi-policy improvement with a distributed simulation-based voting policy evaluation, for approximately solving Markov Decision Processes (MDPs) with infinite horizon discounted reward criterion, and analyze its performance relative to the optimal value.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call