Markerless vision-based teleoperation that leverages innovations in computer vision offers the advantages of allowing natural and noninvasive finger motions for multifingered robot hands. However, current pose estimation methods still face inaccuracy issues due to the self-occlusion of the fingers. Herein, we develop a novel vision-based hand-arm teleoperation system that captures the human hands from the best viewpoint and at a suitable distance. This teleoperation system consists of an end-to-end hand pose regression network and a controlled active vision system. The end-to-end pose regression network (Transteleop), combined with an auxiliary reconstruction loss function, captures the human hand through a low-cost depth camera and predicts joint commands of the robot based on the image-to-image translation method. To obtain the optimal observation of the human hand, an active vision system is implemented by a robot arm at the local site that ensures the high accuracy of the proposed neural network. Human arm motions are simultaneously mapped to the slave robot arm under relative control. Quantitative network evaluation and a variety of complex manipulation tasks, for example, tower building, pouring, and multitable cup stacking, demonstrate the practicality and stability of the proposed teleoperation system.
Read full abstract