Heuristic algorithm for nested Markov decision process: Solution quality and computational complexity

Sefakor Fianu,Lauren B Davis

doi:10.1016/j.cor.2023.106297

Sefakor Fianu, Lauren B Davis

Open Access

https://doi.org/10.1016/j.cor.2023.106297

Copy DOI

Abstract

A nested Markov decision process (NMDP) is a multi-level MDP consisting of an outer MDP and several inner MDPs. These MDPs are dependent on each other such that, each state of the outer MDP induces a unique inner MDP. We propose for the first time an algorithm to solve an infinite horizon nested Markov decision process under the average reward criterion (NMDP-HA). The algorithm incorporates the policy iteration method, which is composed of the value-determination operation and the policy improvement routine. To evaluate the solution quality and computational complexity of the NMDP-HA, we develop a specialized enumerative algorithm adapted from a completely observable MDP equivalent of the NMDP problem. The proposed NMDP-HA is illustrated with several numerical examples and our results for the problem instances evaluated indicate that the heuristic algorithm can find the same optimal solution in a fraction of the time the total enumeration algorithm uses to exhaustively search the entire solution space. For the cases, where the optimal solution is not found, the percentage deviation from the optimal solution is less than 5%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers & Operations Research	Publication Date: Jun 14, 2023
Citations: 1	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Heuristic algorithm for nested Markov decision process: Solution quality and computational complexity

Abstract

Talk to us

Similar Papers

More From: Computers & Operations Research

Lead the way for us

Similar Papers

A unified approach for semi-Markov decision processes with discounted and average reward criteria
Yanjie Li ... Huijing Wang
-
Yanjie Li, et. al.Yanjie Li ... Huijing Wang
01 Jun 2014
01 Jun 2014

On policy iteration as a Newton's method and polynomial policy iteration algorithms

-

28 Jul 2002
28 Jul 2002

Algorithms for the Design of Maximum Hydropathic Complementarity Molecules
Alberto Ceselli ... Giovanni Righini
Journal of Computational Biology | VOL. 19
Alberto Ceselli, et. al.Alberto Ceselli ... Giovanni Righini
01 Mar 2012
Journal of Computational Biology | VOL. 19

Minimizing Expected Transmissions Multicast in Wireless Multihop Networks
Kostas Christodoulopoulos ... Emmanouel Varvarigos
International Journal of Wireless Information Networks | VOL. 22
Kostas Christodoulopoulos, et. al.Kostas Christodoulopoulos ... Emmanouel Varvarigos
30 May 2015
International Journal of Wireless Information Networks | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heuristic algorithm for nested Markov decision process: Solution quality and computational complexity

Abstract

Talk to us

Similar Papers

More From: Computers &amp; Operations Research

More From: Computers & Operations Research