The ventilation resistance coefficients of a mine is vital to ventilation system safety management, diagnosis, and intelligentization. Airflow typically serves as the basis for inverting ventilation resistance coefficients. However, the issue of non-uniqueness in conventional nonlinear optimization methods affects the accuracy of inversion. Therefore, this study introduces a novel optimization approach based on deep reinforcement learning (DRL) to invert resistance coefficients. In this methodology, inversion is regarded as a Markov decision process, with the ventilation network solving model (VNSM) embedded within the DRL environment. We design an agent utilizing deep neural networks, which dynamically adjusts the resistance coefficient state variables by interacting with the VNSM to enhance the consistency between theoretical and measured airflow. The consistency corresponds to the agent’s reward. The proximal policy optimization is employed to optimize the agent’s policy. In field experiment, the MAE between the airflow calculated and the measured airflow is 0.354, with MSE of 0.287, RMSE of 0.536, and MRE of 0.013. Compared with the standard genetic algorithm, differential evolution algorithm, and evolution strategy algorithm, the DRL method shows lower MRE, MAE, RMSE, and MSE values by 23.5%, 15.3%, 14.1%, and 26.4%, respectively. Additionally, DRL method exhibits smaller sensitivity differences for different roadways than others algorithm.