In this paper, a continuous Q-learning algorithm is deployed to optimize the control strategy of a trailing-edge airfoil flow separation at a chord-based Reynolds number of 2×105. With plasma synthetic jets issued at the middle chord and a hot wire placed in the separated shear layer acting as the actuator and sensor, respectively, a high-speed reinforcement learning control at an interaction frequency of 500 Hz is realized by a field-programmable gate array. The results show that in the Q-learning control, the controller only needs several seconds to elevate the instantaneous reward to a level close to the final mean reward, and convergence of the control law typically takes less than 100 s. Although the relative drag reduction achieved by Q-learning control (10.2%) is only slightly higher than the best open-loop periodical control at F∗=4 (9.6%), the maximum power saving ratio is improved noticeably by 62.5%. Physically, Q-learning control creates more turbulent fluctuations, earning more rewards by increasing the transition possibilities toward high-value states. With increasing penalty strength of plasma actuation, the final control laws obtained from Q-learning exhibit a decreasing number of active states. Detailed comparisons between the open-loop and Q-learning control strategies show that the statistics of the controlled velocity fields remain similar, yet the turbulent fluctuations contributed by the vortex shedding mode are reduced by constant-frequency plasma actuation.
Read full abstract