Model-based control is one of the most prevalent techniques for designing and controlling engineering systems. However, many of these systems are complex and characterized by changing dynamics. Hence, online system identification is required to achieve optimum adaptive control performance for such complex systems. This work proposes an algorithm for nonintrusive, online, nonlinear parameter estimation of physical models using deep reinforcement learning (RL). The problem of training a neural network for parameter estimation is formulated as a reinforcement learning problem. The RL-based parameter estimation policy is tested on a simulation of the selective hydrogenation of acetylene, which is a highly nonlinear system. The learned model estimation policy is able to correctly predict the states of the system with a prediction error of less than 1% in various conditions, such as in the presence of measurement noise and structural differences in models.