Abstract

In this research, we focus on the application of reinforcement learning (RL) in automated agent tasks involving considerable target variability (i.e., characterized by stochastic distributions); in particular, learning of inspect/correct tasks. Examples include automated identification & correction of rivet failures in airplane maintenance procedures, and automated cleaning of surgical instruments in a hospital sterilization processing department. The location of defects and the corrective action to be taken for each varies from task episode. What needs to be learned are optimal stochastic strategies rather than optimization of any one single defect type and location. RL has been widely applied in robotics and autonomous agents research, but primarily for problems with relatively low variability compared to the task requirements overall. We characterize the performance of RL at varying levels of variability in a grid world environment at different task complexity levels, and analyze RL performance problems seen during the experiments. The experiments revealed that the higher variability in the stochastic environments significantly reduces the RL agent's performance due to forgetting (or overwriting) effects as the most recent observation from the stochastic environment unduly influences learned behavior. Furthermore, we characterize the impact of variability on hyperparameter selection. To help mitigate the impact of variability on RL performance, we developed a chain of -tables approach aimed at reducing the impact of subtask variability on other subtasks within a training episode. The performance of the chain of -tables approach was assessed against the original SARSA RL and the double SARSA approach. In high and very high variability cases, the chain of -tables approach outperforms the others in terms of the efficiency, accumulated reward, number of steps, and computational time. An adaptive hyperparameter setting method was developed based on a sample variability metric. The approach quickly estimates the environmental variability and automatically sets appropriate hyperparameter values.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.