Abstract
Reinforcement learning is an algorithm without model which is learning what to do--how to map situations to actions--so as to maximize a numerical reward signal. Reinforcement learning provides an available method to the systems, which are very difficult to build up accurate models around complex environment. But now many practical problems demand a maximum reward with not much cost (expense). For example, the production of coal mine is closely correlated with security in that it increases production in the limited range of security situation. On the base of Markov decision process (MDP) and reinforcement learning, the paper introduced constraint Markov decision process into reinforcement learning. The paper improved Q-learning algorithm with adding cost factor and gave a new Q-learning algorithm based on constraint MDP. Finally, according to the constraint between production and safety in coal mine, the paper made the simulation investigation about the action control of coal shearer in coal mine working face. The simulation result had verified the validity of the method.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.