Partial Observations and Belief States

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Similar Papers
  • Research Article
  • Cite Count Icon 11
  • 10.1002/ece3.9197
Partial observability and management of ecological systems.
  • Sep 1, 2022
  • Ecology and evolution
  • Byron K Williams + 1 more

The actual state of ecological systems is rarely known with certainty, but management actions must often be taken regardless of imperfect measurement (partial observability). Because of the difficulties in accounting for partial observability, it is usually treated in an ad hoc fashion, or simply ignored altogether. Yet incorporating partial observability into decision processes lends a realism that has the potential to improve ecological outcomes significantly. We review frameworks for dealing with partial observability, focusing specifically on dynamic ecological systems with Markovian transitions, i.e., transitions among system states that are influenced by the current system state and management action over time. Fully observable states are represented in an observable Markov decision process (MDP), whereas obscure or hidden states are represented in a partially observable process (POMDP). POMDPs can be seen as a natural extension of observable MDPs. Management under partial observability generalizes the situation for complete observability, by recognizing uncertainty about the system's state and incorporating sequential observations associated with, but not the same as, the states themselves. Decisions that otherwise would depend on the actual state must be based instead on state probability distributions (“belief states”). Partial observability requires adaptation of the entire decision process, including the use of belief states and Bayesian updates, valuation that includes expectations over observations, and optimal strategy that identifies actions for belief states over a continuous belief space. We compare MDPs and POMDPs and highlight POMDP applications to some common ecological problems. We clarify the structure and operations, approaches for finding solutions, and analytic challenges of POMDPs for practicing ecologists. Both observable and partially observable MDPs can use an inductive approach to identify optimal strategies and values, with a considerable increase in mathematical complexity with POMDPs. Better understanding of POMDPs can help decision makers manage imperfectly measured ecological systems more effectively.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/ieeeconf53345.2021.9723197
Communication-Free Two-Stage Multi-Agent DDPG under Partial States and Observations
  • Oct 31, 2021
  • Joohyun Cho + 3 more

In this work, we propose a two-stage multi-agent deep deterministic policy gradient (TS-MADDPG) algorithm for communication-free, multi-agent reinforcement learning (MARL) under partial states and observations. In the first stage, we train prototype actor-critic networks using only partial states at actors. In the second stage, we incorporate partial observations resulting from prototype actions as side information at actors to enhance actor-critic training. This side information is useful to infer the unobserved states and hence, can help reduce the performance gap between a network with fully observable states and a partially observable one. Using a case study of building energy control in the power distribution network, we successfully demonstrate that the proposed TS-MADDPG can greatly improve the performance of single-stage MADDPG algorithms that use partial states only. This is the first work that utilizes partial local voltage measurements as observations to improve the MARL performance for a distributed power network.

  • Research Article
  • Cite Count Icon 4
  • 10.1145/3428268
Programming and reasoning with partial observability
  • Nov 13, 2020
  • Proceedings of the ACM on Programming Languages
  • Eric Atkinson + 1 more

Computer programs are increasingly being deployed in partially-observable environments. A partially observable environment is an environment whose state is not completely visible to the program, but from which the program receives partial observations. Developers typically deal with partial observability by writing a state estimator that, given observations, attempts to deduce the hidden state of the environment. In safety-critical domains, to formally verify safety properties developers may write an environment model. The model captures the relationship between observations and hidden states and is used to prove the software correct. In this paper, we present a new methodology for writing and verifying programs in partially observable environments. We present belief programming , a programming methodology where developers write an environment model that the program runtime automatically uses to perform state estimation. A belief program dynamically updates and queries a belief state that captures the possible states the environment could be in. To enable verification, we present Epistemic Hoare Logic that reasons about the possible belief states of a belief program the same way that classical Hoare logic reasons about the possible states of a program. We develop these concepts by defining a semantics and a program logic for a simple core language called BLIMP. In a case study, we show how belief programming could be used to write and verify a controller for the Mars Polar Lander in BLIMP. We present an implementation of BLIMP called CBLIMP and evaluate it to determine the feasibility of belief programming.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/wi-iat.2012.161
Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees
  • Dec 1, 2012
  • Erkin Çilden + 1 more

Extended sequence tree is a direct method for automatic generation of useful abstractions in reinforcement learning, designed for problems that can be modelled as Markov decision process. This paper proposes a method to expand the extended sequence tree method over reinforcement learning to cover partial observability formalized via partially observable Markov decision process through belief state formalism. This expansion requires a reasonable approximation of information state. Inspired by statistical ranking, a simple but effective discretization schema over belief state space is defined. Extended sequence tree method is modified to make use of this schema under partial observability, and effectiveness of resulting algorithm is shown by experiments on some benchmark problems.

  • Book Chapter
  • 10.1007/978-3-031-20862-1_8
Generalized 3-Valued Belief States in Conformant Planning
  • Jan 1, 2022
  • Saurabh Fadnis + 1 more

The high complexity of planning with partial observability has motivated to find compact representations of belief state (sets of states) that reduce their size exponentially, including the 3-valued literal-based approximations by Baral et al. and tag-based approximations by Palacios and Geffner.We present a generalization of 3-valued literal-based approximations, and an algorithm that analyzes a succinctly represented planning problem to derive a set of formulas the truth of which accurately represents any reachable belief state. This set is not limited to literals and can contain arbitrary formulas. We demonstrate that a factored representation of belief states based on this analysis enables fully automated reduction of conformant planning problems to classical planning, bypassing some of the limitations of earlier approaches.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/taai.2011.55
Consistent Belief State Estimation, with Application to Mines
  • Nov 1, 2011
  • Adrien Couëtoux + 2 more

Estimating the belief state is the main issue in games with Partial Observation. It is commonly done by heuristic methods, with no mathematical guarantee. We here focus on mathematically consistent belief state estimation methods, in the case of one-player games. We clearly separate the search algorithm (which might be e.g. alpha-beta or Monte-Carlo Tree Search) and the belief state estimation. We basically propose rejection methods and simple Monte-Carlo Markov Chain methods, with a time budget proportional to the time spent by the search algorithm on the situation at which the belief state is to be estimated; this is conveniently approximated by the number of simulations in the current node. While the approach is intended to be generic, we perform experiments on the well-known Mines game, available on most Windows and Linux distributions. Interestingly, it detects non-trivial facts, e.g. the fact that the probability of winning the game is not the same for different moves, even those with the same probability of immediate death. The rejection method, which is slow but has no parameter and which is consistent in a non-asymptotic setting, performed better than the MCMC method in spite of tuning efforts.

  • Conference Article
  • 10.1109/cac.2013.6775696
A Monte Carlo approach for approximate belief state estimation of dynamic system
  • Nov 1, 2013
  • Gan Zhou + 4 more

Given a system model and a set of observations, model-based monitoring and diagnosis of discrete dynamic system is often cast as the task of determining the likely belief state of components. This problem is tough, because the complexity is exponential to both the number of components and time steps. In this paper, an innovative approximate estimation algorithm, coined MCBSE (Monte Carlo-based Belief State Enumeration), is presented. MCBSE adopts Monte Carlo techniques to efficiently maintain a partial belief state. Moreover, the strategy `first update next allocate' uses observation at time t+1 to calculate the real transition probability, and then distribute the particles at time t. It significantly improves the accuracy of estimator and avoids losing solutions. Empirical results show that MCBSE will outperform BFTE (Best-First Trajectory Enumeration) apparently.

  • Research Article
  • Cite Count Icon 235
  • 10.3166/jancl.21.9-34
Epistemic planning for single- and multi-agent systems
  • Jan 1, 2011
  • Journal of Applied Non-Classical Logics
  • Thomas Bolander + 1 more

In this paper, we investigate the use of event models for automated planning. Event models are the action defining structures used to define a semantics for dynamic epistemic logic. Using event models, two issues in planning can be addressed: Partial observability of the environment and knowledge. In planning, partial observability gives rise to an uncertainty about the world. For single-agent domains, this uncertainty can come from incomplete knowledge of the starting situation and from the nondeterminism of actions. In multi-agent domains, an additional uncertainty arises from the fact that other agents can act in the world, causing changes that are not instigated by the agent itself. For an agent to successfully construct and execute plans in an uncertain environment, the most widely used formalism in the literature on automated planning is “belief states”: sets of different alternatives for the current state of the world. Epistemic logic is a significantly more expressive and theoretically better founded method for representing knowledge and ignorance about the world. Further, epistemic logic allows for planning according to the knowledge (and iterated knowledge) of other agents, allowing the specification of a more complex class of planning domains, than those simply concerned with simple facts about the world. We show how to model multi-agent planning problems using Kripke-models for representing world states, and event models for representing actions. Our mechanism makes use of slight modifications to these concepts, in order to model the internal view of agents, rather than that of an external observer. We define a type of planning domain called epistemic planning domains, a generalisation of classical planning domains, and show how epistemic planning can successfully deal with partial observability, nondeterminism, knowledge and multiple agents. Finally, we show epistemic planning to be decidable in the single-agent case, but only semi-decidable in the multi-agent case.

  • Research Article
  • Cite Count Icon 15
  • 10.1109/tac.2012.2206718
Optimal Stopping Under Partial Observation: Near-Value Iteration
  • Feb 1, 2013
  • IEEE Transactions on Automatic Control
  • Enlu Zhou

We propose a new approximate value iteration method, namely near-value iteration (NVI), to solve continuous-state optimal stopping problems under partial observation, which in general cannot be solved analytically and also pose a great challenge to numerical solutions. NVI is motivated by the expression of the value function as the supremum over an uncountable set of linear functions in the belief state. After a smart manipulation of the operations in the updating equation for the value function, we reduce the set to only two functions at every time step, so as to achieve significant computational savings. NVI yields a value function approximation bounded by the tightest lower and upper bounds that can be achieved by existing algorithms in the same class, so the NVI approximation is closer to the true value function than at least one of these bounds. We demonstrate the effectiveness of our approach on an example of pricing American options under stochastic volatility.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/ths.2017.7943478
Agent-centric approach for cybersecurity decision-support with partial observability
  • Apr 1, 2017
  • Ramakrishna Tipireddy + 4 more

Generating automated cyber resilience policies for real-world settings is a challenging research problem that must account for uncertainties in system state over time and dynamics between attackers and defenders. In addition to understanding attacker and defender motives and tools, and identifying “relevant” system and attack data, it is also critical to develop rigorous mathematical formulations representing the defender's decision-support problem under uncertainty. Game-theoretic approaches involving cyber resource allocation optimization with Markov decision processes (MDP) have been previously proposed in the literature. However, as is the case in strategic card games such as poker, research challenges using game-theoretic approaches for practical cyber defense applications include equilibrium solvability, existence, and possible multiplicity. Moreover, mixed uncertainties associated with player payoffs also need to be accounted for within game settings. This paper proposes an agent-centric approach for cybersecurity decision-support with partial system state observability. Multiple partially observable MDP (POMDP) problems are formulated and solved from a cyber defender's perspective, against a fixed attacker type, using synthetic (notional) system and attack parameters estimated from a Monte Carlo based sampling scheme. The agent-centric problem formulation helps address equilibrium related research challenges and represents a step toward automated and dynamic cyber resilience policy generation and implementation.

  • Research Article
  • Cite Count Icon 2
  • 10.1103/prxenergy.4.013003
Detecting Attacks and Estimating States of Power Grids from Partial Observations with Machine Learning
  • Feb 4, 2025
  • PRX Energy
  • Zheng-Meng Zhai + 2 more

The ever-increasing complexity of modern power grids makes them vulnerable to cyber and/or physical attacks. To protect them, accurate attack detection is essential. A challenging scenario is that a localized attack has occurred on a specific transmission line but only a small number of transmission lines elsewhere can be monitored. That is, full state observation of the whole power grid is not feasible, so attack detection and state estimation need to be done with only limited, partial state observations. We articulate a machine-learning framework to address this problem, where the necessity to deal with sequential time-series data with dynamical memories and to avoid a vanishing gradient has led us to choose the long short-term memory (LSTM) architecture. Leveraging the inherent capabilities of LSTM to handle sequential data and capture temporal dependencies, we demonstrate, using three benchmark power-grid networks, that the complete dynamical state of the whole power grid can be faithfully reconstructed and the attack can be accurately localized from limited, partial state observations even in the presence of noise. The performance improves as more observations become available. Further justification for using the LSTM is provided by our comparing its performance with that of alternative machine-learning architectures such as feedforward neural networks and random forest. Despite the gigantic existing literature on applications of LSTM to power grids, to our knowledge, the problem of locating an attack and estimating the state from limited observations had not been addressed before our work. The method developed can potentially be generalized to broad complex cyber-physical systems.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/coase.2017.8256101
Matrix approach to detectability of discrete event systems under partial observation
  • Aug 1, 2017
  • Xiaoguang Han + 2 more

In this paper, we investigate the problem of detectability of nondeterministic discrete event systems (DESs) with partial event observation and partial state observation (partially-observed DESs for short). Concretely, it include several aspects below. First, we assume that we do not know initially which state the system is in. To discuss detectability property of partially-observed DESs (i.e., how to determine the current and subsequent states of a partially-observed DES after a finite number of observations), we introduce two key notions, namely, unobservable reach and detector for a partially-observed DES. Second, the dynamics of a detector, under the frameworks of the Boolean semi-tensor product (STP) of matrices, are converted into an algebraic representation. Using it, necessary and sufficient conditions are presented to verify whether a partially-observed DES is detectable or not. Compared with the existing approaches, the proposed approach is easier and more direct since it involve only straightforward matrix manipulations. Finally, we illustrate the application of the proposed approach to the verification of detectability property of partially-observed DESs by means of two examples.

  • Research Article
  • Cite Count Icon 2
  • 10.1103/physreve.108.064209
Variability of echo state network prediction horizon for partially observed dynamical systems.
  • Dec 20, 2023
  • Physical Review E
  • Ajit Mahata + 2 more

Study of dynamical systems using partial state observation is an important problem due to its applicability to many real-world systems. We address the problem by studying an echo state network (ESN) framework with partial state input with partial or full state output. Application to the Lorenz system and Chua's oscillator (both numerically simulated and experimental systems) demonstrate the effectiveness of our method. We show that the ESN, as an autonomous dynamical system, is capable of making short-term predictions up to a few Lyapunov times. However, the prediction horizon has high variability depending on the initial condition-an aspect that we explore in detail using the distribution of the prediction horizon. Further, using a variety of statistical metrics to compare the long-term dynamics of the ESN predictions with numerically simulated or experimental dynamics and observed similar results, we show that the ESN can effectively learn the system's dynamics even when trained with noisy numerical or experimental data sets. Thus, we demonstrate the potential of ESNs to serve as cheap surrogate models for simulating the dynamics of systems where complete observations are unavailable.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/icra48506.2021.9561664
Control-Tree Optimization: an approach to MPC under discrete Partial Observability
  • May 30, 2021
  • Camille Phiquepal + 1 more

This paper presents a new approach to Model Predictive Control for environments where essential, discrete variables are partially observed. Under this assumption, the belief state is a probability distribution over a finite number of states. We optimize a \textit{control-tree} where each branch assumes a given state-hypothesis. The control-tree optimization uses the probabilistic belief state information. This leads to policies more optimized with respect to likely states than unlikely ones, while still guaranteeing robust constraint satisfaction at all times. We apply the method to both linear and non-linear MPC with constraints. The optimization of the \textit{control-tree} is decomposed into optimization subproblems that are solved in parallel leading to good scalability for high number of state-hypotheses. We demonstrate the real-time feasibility of the algorithm on two examples and show the benefits compared to a classical MPC scheme optimizing w.r.t. one single hypothesis.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/lwc.2021.3111750
Decentralized Decision for Multi-Band Sensing: A Deep Reinforcement Learning Approach
  • Dec 1, 2021
  • IEEE Wireless Communications Letters
  • Li Li + 5 more

This letter focuses on seeking a robust decentralized solution of multi-band sensing-decision-making (MBSDM) for cognitive wireless networks (CWN). As the MBSDM process of each agent in CWN can be regarded as a Partially Observable Markov Decision Problem (POMDP), we propose a distributed MBSDM algorithm based on distributed reinforcement learning with Multi-Agent Deep Deterministic Policy Gradient (MADDPG) strategy to overcome the partial observability and prohibitive computation. The MADDPG is implemented with offline centralized training and online decentralized execution. For the centralized training, the network of each agent is trained offline with the mixed actions and state information in experience pool. While in decentralized implementation, each agent calculates the local observation based on its belief state and takes action by the well-trained network independently. Comparing with existing algorithms, simulation results showcase the effectiveness and robustness of the proposed MADDPG based decentralized MBSDM algorithm.

Save Icon
Up Arrow
Open/Close