To meet the ever increasing mobile traffic demand in 5G era, base stations (BSs) have been densely deployed in radio access networks (RANs) to increase the network coverage and capacity. However, as the high density of BSs is designed to accommodate peak traffic, it would consume an unnecessarily large amount of energy if BSs are on during off-peak time. To save the energy consumption of cellular networks, an effective way is to deactivate some idle base stations that do not serve any traffic demand. In this paper, we develop a traffic-aware dynamic BS sleep control framework, named DeepBSC, which presents a novel data-driven learning approach to determine the BS active/sleep modes while meeting lower energy consumption and satisfactory Quality of Service (QoS) requirements. Specifically, the traffic demands are predicted by the proposed GS-STN model, which leverages the geographical and semantic spatial-temporal correlations of mobile traffic. With accurate mobile traffic forecasting, the BS sleep control problem is cast as a Markov Decision Process that is solved by Actor-Critic reinforcement learning methods. To reduce the variance of cost estimation in the dynamic environment, we propose a benchmark transformation method that provides robust performance indicator for policy update. To expedite the training process, we adopt a Deep Deterministic Policy Gradient (DDPG) approach, together with an explorer network, which can strengthen the exploration further. Extensive experiments with a real-world dataset corroborate that our proposed framework significantly outperforms the existing methods.