Wireless Mobile Edge Computing (MEC) is an emerging solution to facilitate users’ computing. However, there are still some drawbacks, such as slow learning speed and high task latency. To overcome these difficulties, a deep reinforcement learning (DRL)-based task offloading (DRTO) framework is proposed, taking into account the time-varying nature of wireless channels and the unpredictability of task arrivals in multi-user MEC systems. Many deep neural networks (DNNs) are also used as scalable solutions to quickly determine the best offloading strategy by combining the perceptual capabilities of deep learning with the decision making capabilities of reinforcement learning (RL). Additionally, the concept of a minimum upper bound on the target Q-value is utilized to address the value overestimation problem that occurs during offload policy updates. This approach aims to minimize the latency of offload decisions by considering different channel environments, variable latency constraints among many users, and limited computational resources. Simulation results show that the algorithm reduces computation time and latency while achieving near-optimal performance compared to traditional Deep Deterministic Policy Gradient (DDPG) and Q-learning optimization methods. In addition, DRTO achieves more than 99.6% computational speedup compared to current almost ideal benchmarking techniques. Thus, the DRTO computational offload approach offers great potential for reducing overall system latency and improving the user experience.