Mobile robots have been incorporated into human society to help perform tasks that can affect or endanger health and life. One of the challenges is in the mobile robot-human interaction, as unexpected movements made by people can cause collisions with the robots when accomplishing a task. In this paper an optimized algorithm based on TD3 (Twin Deep Deterministic Policy gradient) used in Deep Reinforcement Learning is proposed, which allows the robot to take its own actions based on the observations it makes, without defining any trajectory beforehand. Using this algorithm that uses the actor-critical policy that helps to determine the linear and angular velocity of the robot, which allows the robot to move in unknown dynamic environments avoiding collisions. It is proposed to use a buffer that stores the values sent by the neural network and analyses them together with the odometry parameters of the robot to send the best decision to the robot to achieve a collision-free path and meet the objectives. The purpose of this algorithm is to meet as many consecutive targets as possible, i.e. the robot never returns to its initial position, it is assigned new targets regardless of the position it has reached. Finally, the results obtained by training the algorithm correspond nearly 0 for the actor and critic training, with a training of 12,000 episodes, and with an evaluation results 92 % of effectivity of our algorithm, based on 772 steps performed by the rob ten in a time of 11 s.