Abstract

We present a mathematical framework and computational methods for optimally designing a finite sequence of experiments. This sequential optimal experimental design (sOED) problem is formulated as a finite-horizon partially observable Markov decision process (POMDP) under a Bayesian setting and with information-theoretic utilities. The formulation is general and may accommodate continuous random variables, non-Gaussian posteriors, and nonlinear forward models. The sOED design policy incorporates elements of feedback and lookahead simultaneously, and we show it to generalize the commonly-used batch and greedy design strategies. We solve for the sOED policy using the policy gradient (PG) method from reinforcement learning, and provide a derivation for the PG expression in the sOED context. Adopting an actor–critic approach, the policy and value functions are parameterized using deep neural networks and improved via PG estimates produced from simulated episodes of designs and observations. The new PG-sOED algorithm is first validated on a linear-Gaussian benchmark, and then compared against other design baselines on a sensor movement problem for contaminant source inversion in a convection–diffusion field. We provide explanation for the policy behaviors using knowledge of the underlying physical process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call