Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

Xuanchen Xiang,Simon Foo

doi:10.3390/make3030029

Abstract

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.

Highlights

We focus applications generally based on Deep Reinforcement Learning, on partially observable Markov decision processes (POMDP) problems
Interpretable Reinforcement Learning (PIRL), to generate interpretable agent policies and a new method called Neurally Directed Program Search (NDPS), to find a programmatic policy with maximal reward; In addition to the works shown above, some techniques discussed by Shao et al [59]: Sharma et al [78] proposed Fine Grained Action Repetition (FiGAR) to improve Deep deterministic policy gradient (DPG) (DDPG); Gao et al [79] used The Open Racing Car Simulator (TORCS) to evaluate Normalized Actor-Critic (NAC); Mazumder et al
For more surveys in Robotics, see [111] in 2009, a study of robot learning from demonstration (LfD), with which a policy is learned from demonstrations, provided by a teacher; Deisenroth [112] made a survey on policy search for robotics in 2013; In 2014, Kober and Peters [113] provided a general survey on RL in robotics; Tai et al [114] presented a comprehensive survey for learning control in robotics from reinforce to imitation in 2018

Summary

Markov Decision Processes

Markov Decision Process is a general mathematical problem representing an optimal path of sequential decisions in an uncertain environment. According to the current state, some rewards are available to get either positive gains or negative costs Another characteristic of MDP is the uncertainty in the consequential state regarding the action taken. The expected utility following policy π from state s is the state value function Vπ (s) of the policy, which is not random: Vπ (s) = E[U π (s)] = E[ ∑ γt R(st )]. State-action value function Qπ (s, a), called Q-value, of a policy is the expected utility of taking action a from state s, following policy π:. When it is not in the end state, the value is equal to the Q-value of the policy This yields the Bellman Equation: Vπ (s) =.

Partially Observable Markov Decision Processes

Reinforcement Learning

RL Algorithms

Deep Reinforcement Learning

Value-Based Algorithms

Policy-Based Algorithms

Actor-Critic Algorithms

Applications

Board Games

Card Games

Video Games

Robotics

Manipulation

Locomotion

Robotics Simulators

Natural Language Processing

Neural Machine Translation

Dialogue

Visual Dialogue

Summary

Final Thoughts

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning and Knowledge Extraction	Publication Date: Jul 15, 2021
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction

Lead the way for us

Similar Papers

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems Part 2—Applications in Transportation, Industries, Communications and Networking and More Topics
Xuanchen Xiang ... Simon Foo
Machine Learning and Knowledge Extraction | VOL. 3
Xuanchen Xiang, et. al.Xuanchen Xiang ... Simon Foo
28 Oct 2021
Machine Learning and Knowledge Extraction | VOL. 3

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture.
Pawel Ladosz ... Nicholas Ketz
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Pawel Ladosz, et. al.Pawel Ladosz ... Nicholas Ketz
01 May 2022
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance
Giacomo Arcieri ... Cyprien Hoelzl
Machine Learning | VOL. 113
Giacomo Arcieri, et. al.Giacomo Arcieri ... Cyprien Hoelzl
31 May 2024
Machine Learning | VOL. 113

Reinforcement Learning for Efficient Power Systems Planning: A Review of Operational and Expansion Strategies
Gabriel Pesántez ... Wilian Guamán
Energies | VOL. 17
Gabriel Pesántez, et. al.Gabriel Pesántez ... Wilian Guamán
01 May 2024
Energies | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning and Knowledge Extraction