Partially observable deep reinforcement learning for multi-agent strategy optimization of human-robot collaborative disassembly: A case of retired electric vehicle battery

Jiaxu Gao,Guoxian Wang,Jinhua Xiao,Pai Zheng,Eujin Pei

doi:10.1016/j.rcim.2024.102775

Abstract

The burgeoning electric vehicle (EV) industry has precipitated a commensurate surge in the consumption of EV batteries, which are currently labor-intensive and inefficient for the recycling and disassembly of EV batteries. However, it is a potential trend to enhance the efficacy and safety of the disassembly of EV batteries based on human-robot collaboration (HRC) method. Because of the uncertainty of retired EV battery disassembly and the inefficiency of the existing disassembling sequence, it is difficult to be fully accomplish through HRC disassembly. The collaborative disassembly of EV batteries by humans and robots can be conceptualized as agents engaging with and learning from the environment, and modeled as a multi-agent Markov game process. This paper aims to address the challenge of HRC in the disassembly of EV batteries by recognizing the dual attributes of partial observability and non-smoothness in the suitable disassembly scenario. A partially observable multi-agent reinforcement learning environment is constructed, incorporating the structural aspects of the EV battery and the disassembly task. The framework is extended to the QMIX-HRC algorithm on the QMIX architecture (as a value-based multi-agent deep reinforcement learning algorithm), specifically designed to tackle the sequence problem in human-robot collaborative disassembly of EV batteries. The optimization results would yield a task sequence to offer maximal global co-benefit during the exploration iteration, facilitating a reduction in labor costs and an enhancement of co-efficiency. The viability of the QMIX-HRC disassembly strategy would be verified through the eventual disassembly sequence of a simulated battery pack through a real human-robot collaborative disassembly station.

Full Text