Abstract

Reactive motion generation problems are usually solved by computing actions as a sum of policies. However, these policies are independent of each other and thus, they can have conflicting behaviors when summing their contributions together. We introduce Composable Energy Policies (CEP), a novel framework for modular reactive motion generation. CEP computes the control action by optimization over the product of a set of stochastic policies. This product of policies will provide a high probability to those actions that satisfy all the components and low probability to the others. Optimizing over the product of the policies avoids the detrimental effect of conflicting behaviors between policies choosing an action that satisfies all the objectives. Besides, we show that CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior, from multimodal distributions to non-smooth distributions and learn a new policy given them

Highlights

  • Many robotic tasks deal with finding a control action satisfying multiple objectives

  • Inspired by Artificial Potential Fields (APF) [21] and Riemannian Motion Policies (RMP) [33], we propose to model the composition of energies in different task spaces

  • Composable Energy Policies (CEP) was able to perform relatively better than the baselines but it got less than 50% success rate in both cage environments, suggesting that in complex scenarios an additional global path planner should be integrated with CEP

Read more

Summary

Introduction

Many robotic tasks deal with finding a control action satisfying multiple objectives. In contrast with more sequential tasks [42, 17, 18, 39], in which the objectives are satisfied and trajectory optimization), approach the problem as an optimization or inference problem over the combined objectives, while the second (reactive motion generation) assumes a complete independence between objectives, lacking any optimality guarantees. A k concatenating them in time, in the presented work, we consider Most of the previous approaches for reactive motion generatasks in which multiple objectives must be satisfied in parallel. We first compute the product of a set of stochastic policies, and we compute the action maximizing the product of them. This approach can be understood as a probabilistic logical conjunction (AND operator) [8, 43] between the stochastic policies (see Fig. 2 for visual representation)

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.