Abstract

In recent years, the importance of artificial intelligence (AI) and reinforcement learning (RL) has exponentially increased in healthcare and learning Dynamic Treatment Regimes (DTR). These techniques are used to learn and recover the best of the doctor’s treatment policies. However, methods based on existing RL approaches are encountered with some limitations e.g. behavior cloning (BC) methods suffer from compounding errors and reinforcement learning (RL) techniques use self-defined reward functions that are either too sparse or need clinical guidance. To tackle the limitations that are associated with RL model, a new technique named Inverse reinforcement learning (IRL) was introduced. In IRL reward function is learned through expert demonstrations. In this paper, we are proposing an IRL approach for finding the true reward function for expert demonstrations. Result shows that with rewards through proposed technique provide fast learning capability to existing RL model as compared to self-defined rewards.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call