Flexible robots (FRs) are generally designed to be lightweight to achieve rapid motion. However, accompanying vibrations and modeling errors influence tracking control, especially in situations involving reference signal loss. This article develops a two-time scale primal-dual inverse reinforcement learning (PD-IRL) framework for FRs to perform tracking tasks with incomplete reference signals. First, consider the admissible policy as a nonconvex input constraint to guarantee the stable operation of the equipment. Then, FRs imitate the demonstration behaviors of an expert, including both rigid and flexible motions, to achieve a balance in tracking speed and vibration suppression. During the imitation process, nonconvex optimization problems of FRs are transformed into corresponding dual problems to obtain the global optimal policy. Moreover, employing multiple linearly independent paths to explore the state space simultaneously can improve convergence speed. Convergence and stability are studied rigorously. Finally, simulations and comparisons show the effectiveness and superiority of the proposed method.
Read full abstract