Abstract
We present a novel simulation-based algorithm, as an extension of the well-known policy iteration algorithm, by combining multi-policy improvement with a distributed simulation-based voting policy evaluation, for approximately solving Markov Decision Processes (MDPs) with infinite horizon discounted reward criterion, and analyze its performance relative to the optimal value.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have