Supervised deep learning is not really suitable for automatic inverse design of photonic crystal waveguides. There are three reasons for this:(1) Pre-training of the network requires a large number of datasets, which leads to a lot of time spent in the physical simulation process. (2) The one-to-many mapping problem often arises during the reverse design process. (3) It is difficult to realize the inverse design of photonic structures beyond the performance of the dataset. These problems can be well addressed by deep reinforcement learning (DRL). Here, deep reinforcement learning is proposed for the first time for the design of complex slow-light photonic crystal waveguides. The design using DRL took less than 72 h to obtain a photonic crystal waveguide with a normalized delay bandwidth product of 0.449 and group index of 48.72. By increasing the scaling factor of bandwidth in the reward function to 0.25, the bandwidth value of the optimized slow-light waveguide increases to 50.01 nm with an NDBP value of 0.39. The slow-light performance of both structures is verified by FDTD. In addition, the NDBP value is further optimized to 0.46 if the size of the structural parameters is not constrained. Compared with deep learning, DRL can improve the sampling efficiency by more than an order of magnitude and is not plagued by the three problems mentioned above. This work confirms the effectiveness of DRL in the inverse design of energy band engineering. DRL can be widely used in the optimization of energy band related performance.