Modeling driver behavior can significantly improve autonomous vehicles and intelligent driver assistance/training systems. Hazard perception (HP) is an essential driving skill for preventing driving accidents in potentially dangerous traffic situations. HP is not a Markov decision process, so modeling HP with reinforcement learning techniques is challenging. We introduce a framework for modeling HP scenarios using the soft actor-critic approach. Then, we use this framework to model HP in the emergence of a critical lateral hazard onto the road. The dual environments – the mirror environment for training and the target environment for usage – accelerate the learning process. The agent is employed in the target environment using an interpreter algorithm. The results show that the agent learns HP and outperforms the best human results by 2.6%. The training is implemented 40 times faster in the mirror environment. Evaluation tests show a driving experience similar to real-world scenarios rather than gaming simulations.