This paper presents applying reinforcement learning to find the optimal sensor/actuator placement (OSAP) policy and optimal control for the flexible wing. The “co-design” objective is to find the OSAP and its associate controller to render the optimal closed-loop performance. The nonlinear vibration dynamics of the flexible wing are modeled in the linear parameter varying (LPV) approach so that LPV- H ∞ controllers can be designed. The co-design problem is formulated into mixed-integer semi-definite programming (MISDP). As a special form of combinatorial optimization, MIDSP solves integer optimization for sensor/actuator selection and convex optimization for controller design. A modified reinforcement learning algorithm is applied to solve this NP-hard optimization problem and obtain a converged solution. In addition, RL is compared with the greedy algorithm and genetic algorithm to demonstrate its strengths and drawbacks in solving high-dimensional MISDP. The solutions obtained by RL and the greedy algorithm are verified and compared in the high-fidelity simulation with the full-order model.
Read full abstract