Partial Observations Research Articles

Real-world scenarios often involve dynamic interactions among competing agents, where decisions are made considering actions taken by others. These situations can be modeled as partially observable stochastic games (POSGs), with zero-sum variants capturing strictly competitive interactions (e.g., security scenarios). While such models address a broad range of problems, they commonly focus on infinite-horizon scenarios with discounted-sum objectives. Using the discounted-sum objective, however, can lead to suboptimal solutions in cases where the length of the interaction does not directly affect the gained rewards of the players.We thus focus on games with undiscounted objective and an indefinite horizon where every realization of the game is guaranteed to terminate after some unspecified number of turns. To manage the computational complexity of solving POSGs in general, we restrict to games with one-sided partial observability where only one player has imperfect information while their opponent is provided with full information about the current situation. We introduce two novel algorithms based on the heuristic search value iteration (HSVI) algorithm that iteratively solve sequences of easier-to-solve approximations of the game using fundamentally different approaches for constructing the sequences: (1) in GoalHorizon, the game approximations are based on a limited number of turns in which players can change their actions, (2) in GoalDiscount, the game approximations are constructed using an increasing discount factor. We provide theoretical qualitative guarantees for algorithms, and we also experimentally demonstrate that these algorithms are able to find near-optimal solutions on pursuit-evasion games and a game modeling privilege escalation problem from computer security.

Read full abstract

Reinforcement learning (RL) has garnered significant attention for developing decision-making agents that aim to maximize rewards, specified by an external supervisor, within fully observable environments. However, many real-world problems involve partial or noisy observations, where agents cannot access complete and accurate information about the environment. These problems are commonly formulated as partially observable Markov decision processes (POMDPs). Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment from observed data. Nevertheless, aggregating observations and actions over time becomes impractical in problems with large decision-making time horizons and high-dimensional spaces. Furthermore, inference-based RL approaches often require many environmental samples to perform well, as they focus solely on reward maximization and neglect uncertainty in the inferred state. Active inference (AIF) is a framework naturally formulated in POMDPs and directs agents to select actions by minimizing a function called expected free energy (EFE). This supplies reward-maximizing (or exploitative) behavior, as in RL, with information-seeking (or exploratory) behavior. Despite this exploratory behavior of AIF, its use is limited to problems with small time horizons and discrete spaces due to the computational challenges associated with EFE. In this article, we propose a unified principle that establishes a theoretical connection between AIF and RL, enabling seamless integration of these two approaches and overcoming their limitations in continuous space POMDP settings. We substantiate our findings with rigorous theoretical analysis, providing novel perspectives for using AIF in designing and implementing artificial agents. Experimental results demonstrate the superior learning capabilities of our method compared to other alternative RL approaches in solving partially observable tasks with continuous spaces. Notably, our approach harnesses information-seeking exploration, enabling it to effectively solve reward-free problems and rendering explicit task reward design by an external supervisor optional.

Read full abstract

Partial Observations Research Articles

Articles published on Partial Observations

Discovering dynamics and parameters of nonlinear oscillatory and chaotic systems from partial observations

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning

Iterative algorithms for solving one-sided partially observable stochastic shortest path games

Collaborative Learning-Based Spectrum Sensing Under Partial Observations

Neural network-based reconstruction of steady-state temperature systems with unknown material composition

Мультиагентное тестирование на проникновение на основе AIRL

Improving community detection in blockmodel by distance-based observation selection

A model template for reachability-based containment checking of imprecise observations in timed automata

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.

Deep Reinforcement Learning for Fluid Mechanics: Control, Optimization, and Automation

Deep reinforcement learning of airfoil pitch control in a highly disturbed environment using partial observations

Path detectability verification for time-dependent systems with application to flexible manufacturing systems

Inference on the Macroscopic Dynamics of Spiking Neurons.

Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity.

Joint computation offloading and resource allocation for end-edge collaboration in internet of vehicles via multi-agent reinforcement learning

ANISE: Assembly-Based Neural Implicit Surface Reconstruction.

Graph learning from incomplete graph signals: From batch to online methods

Codiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dots

Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning

Path-Wise Continuous-Time Transmission with Applications in Source Identification from Partial Observations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Partial Observations Research Articles

Articles published on Partial Observations

Discovering dynamics and parameters of nonlinear oscillatory and chaotic systems from partial observations

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning

Iterative algorithms for solving one-sided partially observable stochastic shortest path games

Collaborative Learning-Based Spectrum Sensing Under Partial Observations

Neural network-based reconstruction of steady-state temperature systems with unknown material composition

Мультиагентное тестирование на проникновение на основе AIRL

Improving community detection in blockmodel by distance-based observation selection

A model template for reachability-based containment checking of imprecise observations in timed automata

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.

Deep Reinforcement Learning for Fluid Mechanics: Control, Optimization, and Automation

Deep reinforcement learning of airfoil pitch control in a highly disturbed environment using partial observations

Path detectability verification for time-dependent systems with application to flexible manufacturing systems

Inference on the Macroscopic Dynamics of Spiking Neurons.

Stochastic EM algorithm for partially observed stochastic epidemics with individual heterogeneity.

Joint computation offloading and resource allocation for end-edge collaboration in internet of vehicles via multi-agent reinforcement learning

ANISE: Assembly-Based Neural Implicit Surface Reconstruction.

Graph learning from incomplete graph signals: From batch to online methods

Codiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dots

Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning

Path-Wise Continuous-Time Transmission with Applications in Source Identification from Partial Observations