Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Siow Meng Low,Akshat Kumar,Scott Sanner

doi:10.1609/aaai.v36i9.21220

Abstract

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous MDP planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this work, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP w.r.t. a locally tight lower-bounded objective. This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity. Empirical evaluation confirms that ILBO is significantly more sample-efficient than the state-of-the-art DRP planner and consistently produces better solution quality with lower variance. We additionally demonstrate that ILBO generalizes well to new problem instances (i.e., different initial states) without requiring retraining.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Deep Bayesian active learning with image data
...
-
, et. al. ...
27 Nov 2017
27 Nov 2017

Deep Metric Representation Learning for Clinical Resting State fMRI.
Arunesh Mittal ... Paul Sajda
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference | VOL. 2022
Arunesh Mittal, et. al.Arunesh Mittal ... Paul Sajda
11 Jul 2022
11 Jul 2022

Multimodal MRI Segmentation of Brain Tissue and T2-Hyperintense White Matter Lesions in Multiple Sclerosis using Deep Convolutional Neural Networks and a Large Multi-center Image Database
Ponnada A Narayana ... Fred D Lublin
-
Ponnada A Narayana, et. al.Ponnada A Narayana ... Fred D Lublin
01 Dec 2018
01 Dec 2018

Category-Level Adversaries for Semantic Domain Adaptation
Congcong Ruan ... Haifeng Hu
IEEE Access | VOL. 7
Congcong Ruan, et. al.Congcong Ruan ... Haifeng Hu
01 Jan 2019
IEEE Access | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence