In this experimental study, a value-based reinforcement learning algorithm (deep Q-network, DQN) is used to control the flow separation behind a backward facing step at a Reynolds number of 2.9 × 104. The flow is forced by a dielectric barrier discharge (DBD) plasma actuator pasted at the upstream of the step edge, and the feedback information of the separation zone is provided by a hotwire sensor submerged in the downstream shear layer. The control law represented by a deep neural network is implemented on a field programable gate array (FPGA), able to execute in real-time at a frequency as high as 1000 Hz. Results show that both open-loop periodical control and DQN control can effectively reduce the reattachment length and the recirculation area. Compared with the former, which requires dozens of trail-and-error measurements lasting for hours, the latter is able to find an optimal control law in only two minutes, achieving a long-term reward 7% higher. Moreover, by introducing a weak penalty term for plasma actuation, the mean actuator power consumption in DQN can be cut down to only 60% of that in the optimal open-loop control, meanwhile sacrificing a negligible amount of control effectiveness. Physically, the open-loop periodical control destabilizes the shear layer earlier, increasing both the area and the peak amplitude of the high turbulent kinetic energy (TKE) zone, whereas under DQN control, only a slight increase in the TKE peak is observed, and the overall spatial distribution remains the same as baseline.