Pain represents a multifaceted sensory and emotional experience often linked to tissue damage, bearing substantial healthcare costs and profound effects on patient well-being. Within intensive care units, effective pain management is paramount. However, determining suitable dosages of primary pain management drugs like morphine remains challenging due to their reliance on diverse patient-specific factors, including cardiovascular responses and pain intensity. To date, only a singular effort has explored personalized pain treatment recommendations through reinforcement learning. Regrettably, this pioneering study faced limitations stemming from incomplete patient state observations, a restricted action space, and the use of Deep Q-Networks, known for their sample inefficiency and lack of clinical interpretability. In our work, we introduced a Conservative Q-learning-based system for pain recommendation, enriching it with expanded state and action spaces. Additionally, we developed a comprehensive pipeline for both qualitative and quantitative evaluations, focusing on assessing the trained model’s performance. Our findings indicate a slight performance improvement over the clinician’s policy, offering a more clinically sensible and understandable approach compared to the current state-of-the-art methodologies.