The purpose of this work is to explore the potential of deep reinforcement learning (DRL) as a black-box optimizer for turbulence model identification. For this, we consider a Reynolds-averaged Navier–Stokes (RANS) closure model of a round turbulent jet flow at a Reynolds number of 10,000. For this purpose, we augment the widely utilized Spalart–Allmaras turbulence model by introducing a source term that is identified by DRL. The algorithm is trained to maximize the alignment of the augmented RANS model velocity fields and time-averaged large eddy simulation (LES) reference data. It is shown that the alignment between the reference data and the results of the RANS simulation is improved by 48% using the Spalart–Allmaras model augmented with DRL compared to the standard model. The velocity field, jet spreading rate, and axial velocity decay exhibit substantially improved agreement with both the LES reference and literature data. In addition, we applied the trained model to a jet flow with a Reynolds number of 15,000, which improved the mean field alignment by 35%, demonstrating that the framework is applicable to unseen data of the same configuration at a higher Reynolds number. Overall, this work demonstrates that DRL is a promising method for RANS closure model identification. Hurdles and challenges associated with the presented methodology, such as high numerical cost, numerical stability, and sensitivity of hyperparameters are discussed in the study.