Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot

Joris De Winter,Albert De Beir,Greet Van De Perre,Bram Vanderborght,Ilias El Makrini,Ann Nowé

doi:10.3390/robotics8040104

Joris De Winter, Albert De Beir + Show 4 more

Open Access

https://doi.org/10.3390/robotics8040104

Copy DOI

Journal: Robotics	Publication Date: Dec 16, 2019
Citations: 13	License type: CC BY 4.0

Affiliation: Vrije Universiteit Brussel, Flanders Make (Belgium)

Abstract

The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.

Highlights

In recent years, the prices of industrial cobots have dropped significantly
The results show that Interactive Reinforcement Learning (IRL)-Potential Based Reward Shaping (PBRS) converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions
The result shows that IRL-FB is not an option because it does not converge to a valid policy within a reasonable amount of time

Summary

Introduction

The prices of industrial cobots (and robots) have dropped significantly. Replacing that engineer with untrained workers (without programming skills) would reduce the cost of production, but it must become possible for these workers to program the cobots. In such a setting, where a cobot is collaborating with a human [3,4], it would be useful if the human could optimize the behavior of his cobot partner. Interactive Reinforcement Learning (IRL) is used, where the reward signal of the RL agent is dependent on the environment and the state, and on the advice from a human. IRL results in an optimal productivity because human and cobot work together, while the autonomy of the cobot increases over time and the workload of the human decreases

Objectives

Methods

Results

Discussion

Conclusion