Location-based services (LBS) in vehicular ad hoc networks (VANETs) must protect users' privacy and address the threat of the exposure of sensitive locations during LBS requests. Users release not only their geographical but also semantic information of the visited places (e.g., hospital). This sensitive information enables the inference attacker to exploit the users' preferences and life patterns. In this paper we propose a reinforcement learning (RL) based sensitive semantic location privacy protection scheme. This scheme uses the idea of differential privacy to randomize the released vehicle locations and adaptively selects the perturbation policy based on the sensitivity of the semantic location and the attack history. This scheme enables a vehicle to optimize the perturbation policy in terms of the privacy and the quality of service (QoS) loss without being aware of the current inference attack model in a dynamic privacy protection process. To solve the location protection problem with high-dimensional and continuous-valued perturbation policy variables, a deep deterministic policy gradient-based semantic location perturbation scheme (DSLP) is developed. The actor part is used to generate continuous privacy budget and perturbation angle, and the critic part is used to estimate the performance of the policy. Simulations demonstrate the DSLP-based scheme outperforms the benchmark schemes, which increases the privacy, reduces the QoS loss, and increases the utility of the vehicle.