Interactive imitation learning for dexterous robotic manipulation: challenges and perspectives—a survey
Dexterous manipulation is a crucial yet highly complex challenge in humanoid robotics, demanding precise, adaptable, and sample-efficient learning methods. As humanoid robots are usually designed to operate in human-centric environments and interact with everyday objects, mastering dexterous manipulation is critical for real-world deployment. Traditional approaches, such as reinforcement learning and imitation learning, have made significant strides, but they often struggle due to the unique challenges of real-world dexterous manipulation, including high-dimensional control, limited training data, and covariate shift. This survey provides a comprehensive overview of these challenges and reviews existing learning-based methods for real-world dexterous manipulation, spanning imitation learning, reinforcement learning, and hybrid approaches. A promising yet underexplored direction is interactive imitation learning, where human feedback actively refines a robot’s behavior during training. While interactive imitation learning has shown success in various robotic tasks, its application to dexterous manipulation remains limited. To address this gap, we examine current interactive imitation learning techniques applied to other robotic tasks and discuss how these methods can be adapted to enhance dexterous manipulation. By synthesizing state-of-the-art research, this paper highlights key challenges, identifies gaps in current methodologies, and outlines potential directions for leveraging interactive imitation learning to improve dexterous robotic skills.
- Conference Article
46
- 10.1109/iros45743.2020.9340947
- Oct 24, 2020
Dexterous manipulation of objects in virtual environments with our bare hands, by using only a depth sensor and a state-of-the-art 3D hand pose estimator (HPE), is challenging. While virtual environments are ruled by physics, e.g. object weights and surface frictions, the absence of force feedback makes the task challenging, as even slight inaccuracies on finger tips or contact points from HPE may make the interactions fail. Prior arts simply generate contact forces in the direction of the fingers' closures, when finger joints penetrate virtual objects. Although useful for simple grasping scenarios, they cannot be applied to dexterous manipulations such as in-hand manipulation. Existing reinforcement learning (RL) and imitation learning (IL) approaches train agents that learn skills by using task-specific rewards, without considering any online user input. In this work, we propose to learn a model that maps noisy input hand poses to target virtual poses, which introduces the needed contacts to accomplish the tasks on a physics simulator. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. A 3D hand pose estimation reward is introduced leading to an improvement on HPE accuracy when the physics-guided corrected target poses are remapped to the input space. As the model corrects HPE errors by applying minor but crucial joint displacements for contacts, this helps to keep the generated motion visually close to the user input. Since HPE sequences performing successful virtual interactions do not exist, a data generation scheme to train and evaluate the system is proposed. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
- Conference Article
5
- 10.1109/icdl49984.2021.9515637
- Aug 23, 2021
The success of deep reinforcement learning approaches to learn dexterous manipulation skills strongly hinges on the rewards assigned to actions during task execution. The usual approach is to handcraft the reward function but due to the high complexity of dexterous manipulations the reward definition demands large engineering effort for each particular task. To avoid this burden, we use an inverse reinforcement learning (IRL) approach to automatically learn the reward function using samples obtained from demonstrations of desired behaviours. We have identified that the learned rewards using existing IRL approaches are strongly biased towards demonstrated actions due to the scarcity of samples in the vast state-action space of dexterous manipulation applications. This significantly hinders performance due to unreliable reward estimations in regions unexplored during demonstration. We use statistical tools for random sample generation and reward normalization to reduce this bias. We show that this approach improves learning stability and transferability of IRL for dexterous manipulation tasks. Project page: https://sites.google.con view/irl-for-dexterous-hand
- Conference Article
5
- 10.1109/humanoids53995.2022.10000161
- Nov 28, 2022
Dexterous manipulation with anthropomorphic robot hands remains a challenging problem in robotics because of the high-dimensional state and action spaces and complex contacts. Nevertheless, skillful closed-loop manipulation is required to enable humanoid robots to operate in unstructured real-world environments. Reinforcement learning (RL) has traditionally imposed enormous interaction data requirements for optimizing such complex control problems. We introduce a new framework that leverages recent advances in GPU-based simulation along with the strength of imitation learning in guiding policy search towards promising behaviors to make RL training feasible in these domains. To this end, we present an immersive virtual reality teleoperation interface designed for interactive human-like manipulation on contact rich tasks and a suite of manipulation environments inspired by tasks of daily living. Finally, we demonstrate the complementary strengths of massively parallel RL and imitation learning, yielding robust and natural behaviors. Videos of trained policies, our source code, and the collected demonstration datasets are available at https://maltemosbach.github.io/interactive_human_like_manipulation/.
- Book Chapter
71
- 10.1007/978-3-031-19842-7_33
- Jan 1, 2022
While significant progress has been made on understanding hand-object interactions in computer vision, it is still very challenging for robots to perform complex dexterous manipulation. In this paper, we propose a new platform and pipeline DexMV (Dexterous Manipulation from Videos) for imitation learning. We design a platform with: (i) a simulation system for complex dexterous manipulation tasks with a multi-finger robot hand and (ii) a computer vision system to record large-scale demonstrations of a human hand conducting the same tasks. In our novel pipeline, we extract 3D hand and object poses from videos, and propose a novel demonstration translation method to convert human motion to robot demonstrations. We then apply and benchmark multiple imitation learning algorithms with the demonstrations. We show that the demonstrations can indeed improve robot learning by a large margin and solve the complex tasks which reinforcement learning alone cannot solve. Code and videos are available at https://yzqin.github.io/dexmvKeywordsDexterous manipulationLearning from human demonstrationReinforcement learning
- Research Article
38
- 10.1016/j.aei.2022.101787
- Oct 1, 2022
- Advanced Engineering Informatics
The reinforcement and imitation learning paradigms have the potential to revolutionise robotics. Many successful developments have been reported in literature; however, these approaches have not been explored widely in robotics for construction. The objective of this paper is to consolidate, structure, and summarise research knowledge at the intersection of robotics, reinforcement learning, and construction. A two-strand approach to literature review was employed. A bottom-up approach to analyse in detail a selected number of relevant publications, and a top-down approach in which a large number of papers were analysed to identify common relevant themes and research trends. This study found that research on robotics for construction has not increased significantly since the 1980s, in terms of number of publications. Also, robotics for construction lacks the development of dedicated systems, which limits their effectiveness. Moreover, unlike manufacturing, construction's unstructured and dynamic characteristics are a major challenge for reinforcement and imitation learning approaches. This paper provides a very useful starting point to understating research on robotics for construction by (i) identifying the strengths and limitations of the reinforcement and imitation learning approaches, and (ii) by contextualising the construction robotics problem; both of which will aid to kick-start research on the subject or boost existing research efforts.
- Conference Article
3
- 10.1109/aim.2019.8868401
- Jul 1, 2019
With the increasing demands from industrial and daily usage, dexterous robot manipulation in unstructured environment becomes a very important technology to make robots behave smartly and interact with people intelligently. To deal with the challenge in efficient task teaching of a robot with multiple Degree-of-freedom (DoF) for dexterous manipulation, this paper develops a mirrored motion remapping method, for enabling a human operator to teach a dual-arm robot face-to-face by direct human motion through handheld HTC VIVE controllers. The provided implementation results shows that the mirror mapping based telemanipulation enables an operator to visualize the on-site conditions and understand the robot actions nicely while telemanipulating a dual-arm robot. The robot is guided to complete variation of complex tasks which involves dexterous dual-arm manipulation efficiently, based on the human-in-the-loop telemanipulation. Such method is very useful to support face-to-face teaching on dual-arm robots and enhances capabilities of dual-arm robots on more dexterous manipulation skills in the near future.
- Dissertation
3
- 10.1184/r1/8397962.v1
- Jul 2, 2019
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactive : an autonomous agent must learn howto behave in an unknown, uncertain,<br>and possibly hostile environment, by actively interacting with the environment to collect useful feedback to improve its sequential decision making ability. The RL agent will also<br>intervene in the environment: the agent makes decisions which in turn affects further evolution of the environment.<br>Because of its generality– most machine learning problems can be viewed as special cases– RL is hard. As there is no direct supervision, one central challenge in RL is how to<br>explore an unknown environment and collect useful feedback efficiently. In recent RL success stories (e.g., super-human performance on video games [Mnih et al., 2015]), we notice<br>that most of them rely on random exploration strategies, such as -greedy. Similarly, policy gradient method such as REINFORCE [Williams, 1992], perform exploration by injecting randomness into action space and hope the randomness can lead to a good sequence of actions that achieves high total reward. The theoretical RL literature has developed more sophisticated algorithms for efficient exploration (e.g., [Azar et al., 2017]), however, the sample<br>complexity of these near-optimal algorithms has to scale exponentially with respect to key parameters of underlying systems such as dimensions of state and action space. Such exponential dependence prohibits a direct application of these theoretically elegant RL algorithms to large-scale applications. In summary, without any further assumptions, RL is hard, both in practice and in theory. In this thesis, we attempt to gain purchase on the RL problem by introducing additional assumptions and sources of information.The first contribution of this thesis comes from improving RL sample complexity via imitation learning. Via leveraging expert’s demonstrations, imitation learning significantly simplifies the tasks of exploration. We consider two settings in this thesis: interactive imitation learning setting where an expert is available to query during training time, and the setting of imitation learning from observation alone, where we only have a set of demonstrations that consist of observations of the expert’s states (no expert actions are recorded). We study in both theory and in practice how one can imitate<br>experts to reduce sample complexity compared to a pure RL approach. The second contribution comes from model-free Reinforcement Learning. Specifically, we study policy<br>evaluation by building a general reduction from policy evaluation to no-regret online learning which is an active research area that has well-established theoretical foundation. Such a reduction creates a new family of algorithms for provably correct policy evaluation under<br>very weak assumptions on the generating process. We then provide a through theoretical study and empirical study of two model-free exploration strategies: exploration in action<br>space and exploration in parameter space. The third contribution of this work comes from model-based Reinforcement Learning. We provide the first exponential sample complexity separation between model-based RL and general model-free RL approaches. We then provide PAC model-based RL algorithm that can achieve sample efficiency simultaneously for many interesting MDPs such as tabular MDPs, Factored MDPs, Lipschitz continuous<br>MDPs, low rank MDPs, and Linear Quadratic Control. We also provide a more practical model-based RL framework, called Dual Policy Iteration (DPI), via integrating optimal<br>control, model learning, and imitation learning together. Furthermore, we show a general convergence analysis that extends the existing approximate policy iteration theories to DPI. DPI generalizes and provides the first theoretical foundation for recent successful practical RL algorithms such as ExIt and AlphaGo Zero [Anthony et al., 2017, Silver et al., 2017], and provides a theoretical sound and practically efficient way of unifying model-based and<br>model-free RL approaches. <br>
- Book Chapter
11
- 10.5772/4796
- Jun 1, 2007
We have presented the newly developed anthropomorphic robot hand named the KH Hand type S and its master slave system using the bilateral controller. The use of an elastic body has improved the robot hand in terms of weight, the backlash of the transmission, and friction between the gears. We have demonstrated the expression of the Japanese finger alphabet. We have also shown an experiment of a peg-in-hole task controlled by the bilateral controller. These results indicate that the KH Hand type S has a higher potential than previous robot hands in performing not only hand shape display tasks but also in grasping and manipulating objects in a manner like that of the human hand. In our future work, we are planning to study dexterous grasping and manipulation by the robot.
- Conference Article
44
- 10.15607/rss.2020.xvi.039
- Jul 12, 2020
Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.
- Research Article
- 10.1126/scirobotics.ads6790
- Aug 20, 2025
- Science robotics
Humans use diverse skills and strategies to effectively manipulate various objects, ranging from dexterous in-hand manipulation (fine motor skills) to complex whole-body manipulation (gross motor skills). The latter involves full-body engagement and extensive contact with various body parts beyond just the hands, where the compliance of our skin and muscles plays a crucial role in increasing contact stability and mitigating uncertainty. For robots, synthesizing these contact-rich behaviors has fundamental challenges because of the rapidly growing combinatorics inherent to this amount of contact, making explicit reasoning about all contact interactions intractable. We explore the use of example-guided reinforcement learning to generate robust whole-body skills for the manipulation of large and unwieldy objects. Our method's effectiveness is demonstrated on Toyota Research Institute's Punyo robot, a humanoid upper body with highly deformable, pressure-sensing skin. Training was conducted in simulation with only a single example motion per object manipulation task, and policies were easily transferred to hardware owing to domain randomization and the robot's compliance. The resulting agent can manipulate various everyday objects, such as a water jug and large boxes, in a similar fashion to the example motion. In addition, we show blind dexterous whole-body manipulation, relying solely on proprioceptive and tactile feedback without object pose tracking. Our analysis highlights the critical role of compliance in facilitating whole-body manipulation with humanoid robots.
- Conference Article
18
- 10.1109/humanoids43949.2019.9034991
- Oct 1, 2019
Although deep Reinforcement Learning (RL) has been successfully applied to a variety of tasks, manually designing appropriate reward functions for such complex tasks as robotic cloth manipulation still remains challenging and costly. In this paper, we explore an approach of Generative Adversarial Imitation Learning (GAIL) for robotic cloth manipulation tasks, which allows an agent to learn near-optimal behaviors from expert demonstration and self explorations without explicit reward function design. Based on the recent success of value-function based RL with the discrete action set for robotic cloth manipulation tasks [1], we develop a novel value-function based imitation learning framework, P-GAIL. P-GAIL employs a modified value-function based deep RL, Entropy-maximizing Deep P-Network, that can consider both the smoothness and causal entropy in policy update. After investigating its effectiveness through a toy problem in simulation, P-GAIL is applied to a dual-arm humanoid robot tasked with flipping a handkerchief and successfully learns a policy close to a human demonstration with limited exploration and demonstration. Experimental results suggest both fast and stable imitation learning ability and sample efficiency of P-GAIL in robotic cloth manipulation.
- Dissertation
- 10.25534/tuprints-00014184
- Oct 16, 2020
Interactive Machine Learning for Assistive Robots
- Research Article
19
- 10.3389/fnbot.2022.861825
- Apr 25, 2022
- Frontiers in Neurorobotics
With the increasing demand for the dexterity of robotic operation, dexterous manipulation of multi-fingered robotic hands with reinforcement learning is an interesting subject in the field of robotics research. Our purpose is to present a comprehensive review of the techniques for dexterous manipulation with multi-fingered robotic hands, such as the model-based approach without learning in early years, and the latest research and methodologies focused on the method based on reinforcement learning and its variations. This work attempts to summarize the evolution and the state of the art in this field and provide a summary of the current challenges and future directions in a way that allows future researchers to understand this field.
- Book Chapter
- 10.1007/978-3-031-32606-6_7
- Jan 1, 2023
This paper presents a user-friendly method for programming humanoid robots without the need for expert knowledge. We propose a combination of imitation learning and reinforcement learning to teach and optimize demonstrated trajectories. An initial trajectory for reinforcement learning is generated using a stable whole-body motion imitation system. The acquired motion is then refined using a stochastic optimal control-based reinforcement learning algorithm called Policy Improvement with Path Integrals with Covariance Matrix Adaptation ( $$\text {PI}^{2}$$ -CMA). We tested the approach for programming humanoid robot reaching motion. Our experimental results show that the proposed approach is successful at learning reaching motions while preserving the postural balance of the robot. We also show how a stable humanoid robot trajectory learned in simulation can be effectively adapted to different dynamic environments, e.g. a different simulator or a real robot. The resulting learning methodology allows for quick and efficient optimization of the demonstrated trajectories while also taking into account the constraints of the desired task. The learning methodology was tested in a simulated environment and on the real humanoid robot TALOS.
- Research Article
13
- 10.3389/frobt.2020.00097
- Sep 24, 2020
- Frontiers in Robotics and AI
The ability to learn new tasks by sequencing already known skills is an important requirement for future robots. Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills for sequential tasks. However, in real robotic applications, the cost of sample collection and exploration prevent the application of reinforcement learning for a variety of tasks. To overcome these limitations, human input during reinforcement can be beneficial to speed up learning, guide the exploration and prevent the choice of disastrous actions. Nevertheless, there is a lack of experimental evaluations of multi-channel interactive reinforcement learning systems solving robotic tasks with input from inexperienced human users, in particular for cases where human input might be partially wrong. Therefore, in this paper, we present an approach that incorporates multiple human input channels for interactive reinforcement learning in a unified framework and evaluate it on two robotic tasks with 20 inexperienced human subjects. To enable the robot to also handle potentially incorrect human input we incorporate a novel concept for self-confidence, which allows the robot to question human input after an initial learning phase. The second robotic task is specifically designed to investigate if this self-confidence can enable the robot to achieve learning progress even if the human input is partially incorrect. Further, we evaluate how humans react to suggestions of the robot, once the robot notices human input might be wrong. Our experimental evaluations show that our approach can successfully incorporate human input to accelerate the learning process in both robotic tasks even if it is partially wrong. However, not all humans were willing to accept the robot's suggestions or its questioning of their input, particularly if they do not understand the learning process and the reasons behind the robot's suggestions. We believe that the findings from this experimental evaluation can be beneficial for the future design of algorithms and interfaces of interactive reinforcement learning systems used by inexperienced users.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.