Abstract

Applying model-based learning for the optimal decision of the multi-agent system is not trivial due to the expensive price or even the impossibility of obtaining the ground truth for training the model of the complex environment. Such as learning the optimal action of hydraulic supports in the top-coal caving, the optimal action could not accessible as the ground truth of the corresponding state in the intricate processes. Regarding the latent ground truth as the hidden variable is an effective method in the hidden Markov model. This paper extends the hidden variable of ground truth to the multi-agent system and proposes the hidden Markov random field (HMRF) model with reinforcement learning for optimizing the action decision of the multi-agent. In the HMRF model, the input states and the output actions of the multi-agent are considered as an observable random field and a latent Markov random field, respectively. Based on the HMRF model, the optimal decision is inferred by the maximum posterior probability with the prior probability obtained by Q-learning. Meanwhile, the parameters of the HMRF model are estimated by the expectation maximum algorithm. In the experiment, the top-coal caving demonstrates the effectiveness of the proposed method that the recall of top-coal is improved prominently with a very small price of increasing the rock-rate. Furthermore, the proposed method is employed to deal with the predator-preys problem in the gym. The experiment result shows that the communication between agents by the HMRF increases the reward of the preys.

Highlights

  • Top-coal caving is the most efficient mining method in the underground thick coal seam at present [1]

  • In order to address these issues, this paper proposes a probabilistic graphical model (PGM) [6] for the multi-agent system to obtain the optimal decision with the following motivation: Because in the PGM of a multi-agent, the agents and their relationships can be depicted uniformly by the nodes and edges, the model architecture is explicit to describe the relationship between the current agent and its neighbors, and the optimal decision could be inferred by the PGM directly

  • This paper proposes a new method based on the hidden Markov random field (HMRF) for the optimal decision of a multi-agent system in top-coal caving, in which each agent is considered as the node of a probabilistic graphical model, and the decision of each agent is inferred by the maximum a posterior (MAP) with expectation maximization (EM) algorithm estimating the parameters

Read more

Summary

INTRODUCTION

Top-coal caving is the most efficient mining method in the underground thick coal seam at present [1]. The optimal decision of windows’ action is one of the most important issues in top-coal caving to maximize the top-coal capturing minimize the number of rocks falling into drag conveyor. In order to maximize the output of top coal with lower rocks by automating the windows’ action, it is important to develop models that can control the windows collaboratively. Each decision maker of the window is an agent which can learn to predict what action to take from the experience In this multi-agent system of the FMCW, the dimension of state space and the computation complexity will expand exponentially with the increasing number of the agents [5]. (I) The actions of HSs window for top-coal caving are transformed into the multi-agent decision methodology to approach optimal performance.

RELATED WORKS
PRELIMINARIES
HIDDEN MARKOV RANDOM FIELD FOR MULTI-AGENT SYSTEM
MAXIMUM A POSTERIOR WITH HMRF FOR OPTIMAL DECISION
16 Update the states x
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call