This article addresses the problem of learning the objective function of linear discrete-time systems that use static output-feedback (OPFB) control by designing inverse reinforcement learning (RL) algorithms. Most of the existing inverse RL methods require the availability of states and state-feedback control from the expert or demonstrated system. In contrast, this article considers inverse RL in a more general case where the demonstrated system uses static OPFB control with only input-output measurements available. We first develop a model-based inverse RL algorithm to reconstruct an input-output objective function of a demonstrated discrete-time system using its system dynamics and the OPFB gain. This objective function infers the demonstrations and OPFB gain of the demonstrated system. Then, an input-output Q -function is built for the inverse RL problem upon the state reconstruction technique. Given demonstrated inputs and outputs, a data-driven inverse Q -learning algorithm reconstructs the objective function without the knowledge of the demonstrated system dynamics or the OPFB gain. This algorithm yields unbiased solutions even though exploration noises exist. Convergence properties and the nonunique solution nature of the proposed algorithms are studied. Numerical simulation examples verify the effectiveness of the proposed methods.
Read full abstract