In the process of wireless network video streaming, especially in more complex scenarios (such as video transmission of 5G-powered drones), analyzing the quality of experience (QoE) of the video streaming is a very crucial task. Thus attention should be paid to the dynamic interaction between QoE indicators including buffer starvation probability and traffic load. This paper proposes a video streaming scheduling model based on reinforcement learning. By learning the correlation between user behavior and traffic patterns, a series of resource allocation strategies that optimize QoE indicators are obtained. Since there is a certain degree of randomness in the network status at each moment in the transmission process, the model introduces exploration rewards to solve the noise problem of random environments. At the same time, this mechanism enables the model to fully explore the environment even when the reward is sparse, so as to obtain an effective scheduling strategy. Simulation experiments have proved that our model can improve the long-term QoE of video streaming in different network environments.