Abstract Optical solitons in mode-locked fiber lasers and optical communication links have a wide range of applications. A crucial step in studying the transmission modes of optical solitons is to investigate the relationship between equation parameters and soliton evolution using deep learning techniques. However, current identification models have a limited parameter domain search range and are greatly influenced by initialization, often leading to divergence towards incorrect parameter values. This research harnesses reinforcement learning to revamp the iterative process of the parameter identification model. Through the development of a two-stage optimization strategy, the model is capable of conducting an accurate parameter search across arbitrary domains. The investigation encompasses a series of experiments on various standard and higher-order equations, illustrating that the innovative model overcomes the impact of initialization on the parameter search, and the identified parameters are guided towards their correct values. The enhanced model markedly improves experimental efficiency and holds significant promise for advancing the research of soliton propagation dynamics and addressing intricate scenarios.