Contextual policy search methods have demonstrated the potential to acquire robotic skill generalization on trajectory-shaping-based tasks. However, it is still challenging for robotic contact-rich manipulation tasks because contact force regulation, reference trajectory adaptation, and task generalization must be fulfilled simultaneously. To this end, a hierarchical compliance-based contextual policy search (HC-CPS) approach is proposed to learn the robotic compliant skills for force, motion, and task adaptation. Specifically, the parameterized impedance-conditioned action space is proposed for reinforcement learning lower-level policy to obtain the compliance for reference motion regulation and contact force control, while a linear Gaussian contextual policy is formulated as the higher-level policy to optimize the context-conditioned impedance parameters for task generalization; therefore, a family of contact-rich manipulation tasks with multiple objectives is achieved. Moreover, data efficiency is further improved by two aspects: first, a variation encoder-decoder model is proposed to estimate the underlying constraints of impedance parameters over the actions, leading to the mitigated extrapolation error for lower-level policy off-policy learning; second, a composite forward model is proposed to generate artificial trajectories and reduce the reward bias for higher-level contextual policy learning. The HC-CPS approach is validated by three simulated manipulation tasks and the real-world dual peg-in-hole assembly tasks with two kinds of objectives, and the results demonstrate the effectiveness of HC-CPS.
Read full abstract