Markov Decision Process (MDP) is a popular mathematical framework for modeling stochastic sequential problems under uncertainty. These models appear in many applications, such as computer science, engineering, telecommunications, and finance, among others. One of the most challenging goals is to deal with complexity reduction in the case of large MDP. In this paper; we propose an optimal strategy deals with large MDP under discount reward. The proposed approach is based on an intelligent combination of a decomposition technique and an efficient parallel strategy. The global MDP is splitting into several 'sub-MDPs', subsequently, these MDPs are classified by level following the strongly connected components principle. A master-slave strategy base on Message Passing Interface (MPI) is proposed to solve the obtained problem. The performance of the proposed approach is shown in terms of scalability, cost, and execution speed.
Read full abstract