Abstract

We present a mathematical framework and computational methods for optimally designing a finite sequence of experiments. This sequential optimal experimental design (sOED) problem is formulated as a finite-horizon partially observable Markov decision process (POMDP) under a Bayesian setting and with information-theoretic utilities. The formulation is general and may accommodate continuous random variables, non-Gaussian posteriors, and nonlinear forward models. The sOED design policy incorporates elements of feedback and lookahead simultaneously, and we show it to generalize the commonly-used batch and greedy design strategies. We solve for the sOED policy using the policy gradient (PG) method from reinforcement learning, and provide a derivation for the PG expression in the sOED context. Adopting an actor–critic approach, the policy and value functions are parameterized using deep neural networks and improved via PG estimates produced from simulated episodes of designs and observations. The new PG-sOED algorithm is first validated on a linear-Gaussian benchmark, and then compared against other design baselines on a sensor movement problem for contaminant source inversion in a convection–diffusion field. We provide explanation for the policy behaviors using knowledge of the underlying physical process.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.