Abstract
Decision making is one of the central problems in artificial intelligence and specifically in robotics. In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). A POMDP has a wide range of applications such as: Machine Vision, Marketing, Network troubleshooting, Medical diagnosis etc. In recent years, there has been a significant interest in developing techniques for finding policies for (POMDPs).We consider two new techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF) based on Incremental Pruning (IP) POMDP solver to introduce an alternative method to Linear Programming (LP) filter for IP. Both, RPF and SCF have solutions for several POMDP problems that LP could not converge to in 24 hours. Experiments are run on problems from POMDP literature, and an Average Discounted Reward (ADR) is computed by testing the policy in a simulated environment.
Highlights
One of the most challenging tasks of an intelligent decision maker or agent is planning, or choosing how to act in such of interactions with environment
There has been a significant interest in developing techniques for finding policies for (POMDPs).We consider two new techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF) based on Incremental Pruning (IP) Partially Observable Markov Decision Process (POMDP) solver to introduce an alternative method to Linear Programming (LP) filter for IP
We considered two new filtering techniques, called Recursive Point Filter (RPF) and Scan Line Filter (SCF for Incremental Pruning (IP) POMDP solver to introduce an alternative method for Linear Programming (LP) filter
Summary
One of the most challenging tasks of an intelligent decision maker or agent is planning, or choosing how to act in such of interactions with environment. Such agent/environment interactions can be often be effectively modelled as a Partially Observable Markov Decision Process (POMDPs).Operation research [1,2] and stochastic control [3] are two domains where this model can be applied for balancing between competing objectives, action costs, uncertainty of action effects and observations that provide incomplete knowledge about the world. In the context for a POMDP, corresponds to finding an optimal policy for the agent to follow.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.