Enhance pick-and-place performance using multimodal interaction in operation environment

Xinwei Guo,Yang Chen

doi:10.1108/ir-10-2022-0260

Abstract

PurposeCurrently, the vision and depth information obtained from the eye-to-hand RGB-D camera can apply to the reconstruction of the three-dimensional (3D) environment for a robotic operation workspace. The reconstructed 3D space contributes to a symmetrical and equal observation view for robots and humans, which can be considered a digital twin (DT) environment. The purpose of this study is to enhance the robot skill in the physical workspace, although the artificial intelligence (AI) technique has high performance of the robotic operation in the known environments.Design/methodology/approachA multimodal interaction framework is proposed in DT operation environments.FindingsA fast image-based target segmentation technique is combined in the 3D reconstruction of the robotic operation environment from the eye-to-hand camera, thus expediting the 3D DT environment generation without accuracy loss. A multimodal interaction interface is integrated into the DT environment.Originality/valueThe users are supported to operate the virtual objects in the DT environment using speech, mouse and keyboard simultaneously. The humans’ operations in 3D DT virtual space are recorded, and cues are provided for the robot’s operations in practice.

Full Text