Partially Observable Markov Decision Process Research Articles

Reinforcement learning (RL) has garnered significant attention for developing decision-making agents that aim to maximize rewards, specified by an external supervisor, within fully observable environments. However, many real-world problems involve partial or noisy observations, where agents cannot access complete and accurate information about the environment. These problems are commonly formulated as partially observable Markov decision processes (POMDPs). Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment from observed data. Nevertheless, aggregating observations and actions over time becomes impractical in problems with large decision-making time horizons and high-dimensional spaces. Furthermore, inference-based RL approaches often require many environmental samples to perform well, as they focus solely on reward maximization and neglect uncertainty in the inferred state. Active inference (AIF) is a framework naturally formulated in POMDPs and directs agents to select actions by minimizing a function called expected free energy (EFE). This supplies reward-maximizing (or exploitative) behavior, as in RL, with information-seeking (or exploratory) behavior. Despite this exploratory behavior of AIF, its use is limited to problems with small time horizons and discrete spaces due to the computational challenges associated with EFE. In this article, we propose a unified principle that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their limitations in continuous space POMDP settings. We substantiate our findings with rigorous theoretical analysis, providing novel perspectives for using AIF in designing and implementing artificial agents. Experimental results demonstrate the superior learning capabilities of our method compared to other alternative RL approaches in solving partially observable tasks with continuous spaces. Notably, our approach harnesses information-seeking exploration, enabling it to effectively solve reward-free problems and rendering explicit task reward design by an external supervisor optional.

Abstract Resource allocation for invasive species management requires information about the size of the invasive population, which may be expensive and time‐consuming to obtain. The trade‐off between investment in monitoring and control efforts is a challenging decision problem, and existing mathematical tools are often difficult to interpret, and/or limited to a specific case study. We propose a partially observable Markov decision process (POMDP) framework to help decision‐makers understand effective monitoring and control policymaking. POMDPs can deal with uncertainty in both the model and state of the system but are more challenging to solve due to the continuous and high‐dimensional state space. Rather than limiting the possible states of the system, as do most previously proposed methods, we work through the development of a density projection approach, which reduces the dimensionality of the space of beliefs by restricting them to a parametrised family of probability distributions. This serves to align the mathematical representation of the problem with the real‐world quantities relevant to human decision‐making. The result of our model is a sequence of actions, which minimises the expected cost incurred in managing the invasive species, where the recommendation depends on an estimate of the species' abundance, and the uncertainty in this estimate. We demonstrate the effectiveness of our proposed framework with a case study on tropical fire ant (Solenopsis geminata) control and two generic case studies of varying complexity. Furthermore, we investigate sensitivity of the results to the choices of control cost and efficacy, and monitoring cost and error. The framework proposed by this paper makes the powerful machinery of POMDPs available to environmental managers. It computes the optimal course of action to manage a growing population of an invasive species, incorporating a varying time horizon and multiple control interventions. We sidestep the computational difficulties of general POMDPs to provide a clear, visual overview of decision‐making recommendations, and how these decisions change in new situations. Initial results and scenario‐based analysis show promising results, and the framework could be extended to the related field of disease management.

Partially Observable Markov Decision Process Research Articles

Related Topics

Articles published on Partially Observable Markov Decision Process

Multi-Agent DRL for Air-to-Ground Communication Planning in UAV-Enabled IoT Networks.

A partitioning Monte Carlo approach for consensus tasks in crowdsourcing

Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning

Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model

Multi-agent reinforcement learning for task offloading with hybrid decision space in multi-access edge computing

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.

Multi-objective joint optimization of task offloading based on MADRL in internet of things assisted by satellite networks

Reinforcement Learning for Jointly Optimal Coding and Control Policies for a Markovian System Controlled over a Communication Channel

No compromise in solution quality: Speeding up belief-dependent continuous partially observable Markov decision processes via adaptive multilevel simplification

Battery health-considered energy management strategy for a dual-motor two-speed battery electric vehicle based on a hybrid soft actor-critic algorithm with memory function

Digital twins-boosted intelligent maintenance of ageing bridge hangers exposed to coupled corrosion–fatigue deterioration

A Workflow for Building Computationally Rational Models of Human Behavior

Transfer entropy on collective motion with undeclared loose leader–follower (LLF) structure

Rapid data collection and processing in dense urban edge computing networks with drone assistance

When to monitor or control: Informed invasive species management using a partially observable Markov decision process (POMDP) framework

Research on mobile robot path planning in complex environment based on DRQN algorithm

Competitive pricing for ride-sourcing platforms with MARL

Examining chronic kidney disease screening frequency among diabetics: a POMDP approach.

Trajectory planning for airborne radar in extended target tracking based on deep reinforcement learning

Strong Simple Policies for POMDPs

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Partially Observable Markov Decision Process Research Articles

Related Topics

Articles published on Partially Observable Markov Decision Process

Multi-Agent DRL for Air-to-Ground Communication Planning in UAV-Enabled IoT Networks.

A partitioning Monte Carlo approach for consensus tasks in crowdsourcing

Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning

Online Learning Strategy Induction through Partially Observable Markov Decision Process-Based Cognitive Experience Model

Multi-agent reinforcement learning for task offloading with hybrid decision space in multi-access edge computing

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.

Multi-objective joint optimization of task offloading based on MADRL in internet of things assisted by satellite networks

Reinforcement Learning for Jointly Optimal Coding and Control Policies for a Markovian System Controlled over a Communication Channel

No compromise in solution quality: Speeding up belief-dependent continuous partially observable Markov decision processes via adaptive multilevel simplification

Battery health-considered energy management strategy for a dual-motor two-speed battery electric vehicle based on a hybrid soft actor-critic algorithm with memory function

Digital twins-boosted intelligent maintenance of ageing bridge hangers exposed to coupled corrosion–fatigue deterioration

A Workflow for Building Computationally Rational Models of Human Behavior

Transfer entropy on collective motion with undeclared loose leader–follower (LLF) structure

Rapid data collection and processing in dense urban edge computing networks with drone assistance

When to monitor or control: Informed invasive species management using a partially observable Markov decision process (POMDP) framework

Research on mobile robot path planning in complex environment based on DRQN algorithm

Competitive pricing for ride-sourcing platforms with MARL

Examining chronic kidney disease screening frequency among diabetics: a POMDP approach.

Trajectory planning for airborne radar in extended target tracking based on deep reinforcement learning

Strong Simple Policies for POMDPs