Abstract

We have interested in a conscious decision-making system in an environment where multiple types of rewards and penalties exist. We know a method using a basis function and a method using a penalty avoidance list for this problem. Though the method using the penalty avoidance list is considered promising compared to the former, it has the problem that when all actions are registered in the avoidance list, the actions are selected at random. We propose a method for selecting actions using deep reinforcement learning in order to avoid such random selection as much as possible in this paper. The effectiveness of the proposed method is confirmed by numerical experiments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call