Reinforcement learning is a branch of artificial intelligence that trains algorithms using a trial-and-error system. Reinforcement learning interacts with its environment and observes the consequences of its actions in response to rewards or punishments received. Reinforcement Learning uses information from every interaction with its environment to update its knowledge. The problem identified from this research is the lack of consistency, which is not always the same for Non-Player Characters (Agents) in the process of exploring an environment (Game environment). This research uses the Software Development Life Cycle (SDLC) Waterfall model method to train Non Player Characters (Agents) in the Amc Dash Mark I Game which uses the Deep Q Network (DQN) algorithm in several stages. Training results show improvements in model performance over time. The average duration of the episode and average reward episode showed an increase of 7.75 to 24.7, while the exploration rate decreased to 0.05. This indicates that the model has experienced learning and is improving to achieve better rewards by performing fewer actions. The lower loss also shows that the model has succeeded in reducing prediction errors and improving prediction capabilities.