Traditional optimization-based voltage controllers for distribution grid applications require consumption/production values from the meters as well as accurate grid data (i.e., Line impedances) for modeling purposes. Those algorithms are sensitive to uncertainties, notably in consumption and production forecasts or grid models. this paper focuses on the latter. Indeed, Line parameters gradually deviate from their original values over time due to exploitation and weather conditions. Also, those data are oftentimes not fully available at the low voltage side thus creating sudden changes between the datasheet and the actual value. To mitigate the impact of uncertain line parameters, this paper proposes the use of a deep reinforcement learning algorithm for voltage regulation purposes in a distribution grid with PV production by controlling the setpoints of distributed storage units as flexibilities. two algorithms are considered namely TD3PG and PPO. A two-stage strategy is also proposed, with offline training on a grid model and further online training on an actual system (with distinct impedance values). The controllers’ performances are assessed regarding the algorithms’ hyperparameters, and the obtained results are compared with a second-order conic relaxation optimization-based control. The results show the relevance of the RL-based control, in terms of accuracy, robustness to gradual or sudden variations on the line impedances, and significant speed improvement (once trained).