Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation

Nathan Phelps,Stephanie Marrocco,Stephanie Cornell,Dalton L Wolfe,Daniel J Lizotte

doi:10.1016/j.ibmed.2024.100137

Nathan Phelps, Stephanie Marrocco + Show 3 more

Open Access

https://doi.org/10.1016/j.ibmed.2024.100137

Copy DOI

Journal: Intelligence-Based Medicine	Publication Date: Jan 1, 2024
License type: cc-by-nc-nd

Affiliation: Western University

Abstract

Reinforcement learning (RL) has helped improve decision-making in several domains but can be challenging to apply; this is the case for rehabilitation of people with a spinal cord injury (SCI). Among other factors, applying RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few detailed records of longitudinal treatments and outcomes (i.e., limited training data). Applying Fitted Q Iteration in this domain with linear models and the most natural state and action representation results in problems with convergence and overfitting. However, isolating treatments from one another can mitigate the convergence issue, and treatments for SCIs have meaningful groupings that can be used to combat overfitting. We propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. After re-interpreting the data using these treatment grouping approaches in conjunction with our process that isolates the treatment groups, we use Fitted Q Iteration to train an agent that learns to select better treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that agents trained after using either grouping method can help improve the treatment decisions of individual physiotherapists, but the approach based on domain knowledge offers better performance. Our findings provide a proof of concept that applying RL has the potential to help improve the treatment of those with an SCI and indicates that continued efforts to gather data and apply RL to this domain are worthwhile.

Full Text