Towards balanced behavior cloning from imbalanced datasets

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Abstract Robots should be able to learn complex behaviors from human demonstrations. In practice, these human-provided datasets are inevitably imbalanced : i.e., the human demonstrates some subtasks more frequently than others. State-of-the-art methods default to treating each element of the human’s dataset as equally important. So if—for instance—the majority of the human’s data focuses on reaching a goal, and only a few state-action pairs move to avoid an obstacle, the learning algorithm will place greater emphasis on goal reaching. More generally, misalignment between the relative amounts of data and the importance of that data causes fundamental problems for imitation learning approaches. In this paper we analyze and develop learning methods that automatically account for mixed datasets. We formally prove that imbalanced data leads to imbalanced policies when each state-action pair is weighted equally; these policies emulate the most represented behaviors, and not the human’s complex, multi-task demonstrations. We next explore algorithms that rebalance offline datasets (i.e., reweight the importance of different state-action pairs) without human oversight. Reweighting the dataset can enhance the overall policy performance. However, there is no free lunch: each method for autonomously rebalancing brings its own pros and cons. We formulate these advantages and disadvantages, helping other researchers identify when each type of approach is most appropriate. We conclude by introducing a novel meta-gradient rebalancing algorithm that addresses the primary limitations behind existing approaches. Our experiments show that dataset rebalancing leads to better downstream learning, improving the performance of general imitation learning algorithms without requiring additional data collection. See our project website: https://collab.me.vt.edu/data_curation/ .

Similar Papers
  • Conference Article
  • Cite Count Icon 6
  • 10.1109/ssci50451.2021.9660156
Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments
  • Dec 5, 2021
  • Gustavo Claudio Karl Couto + 1 more

Autonomous driving is a complex task, which has been tackled since the first self-driving car ALVINN in 1989, with a supervised learning approach, or behavioral cloning (BC). In BC, a neural network is trained with state-action pairs that constitute the training set made by an expert, i.e., a human driver. However, this type of imitation learning does not take into account the temporal dependencies that might exist between actions taken in different moments of a navigation trajectory. These type of tasks are better handled by reinforcement learning (RL) algorithms, which need to define a reward function. On the other hand, more recent approaches to imitation learning, such as Generative Adversarial Imitation Learning (GAIL), can train policies without explicitly requiring to define a reward function, allowing an agent to learn by trial and error directly on a training set of expert trajectories. In this work, we propose two variations of GAIL for autonomous navigation of a vehicle in the realistic CARLA simulation environment for urban scenarios. Both of them use the same network architecture, which process high-dimensional image input from three frontal cameras, and other nine continuous inputs representing the velocity, the next point from the sparse trajectory and a high-level driving command. We show that both of them are capable of imitating the expert trajectory from start to end after training ends, but the GAIL loss function that is augmented with BC outperforms the former in terms of convergence time and training stability.

  • Research Article
  • Cite Count Icon 41
  • 10.1109/tvt.2022.3150343
An Integrated Decision-Making Framework for Highway Autonomous Driving Using Combined Learning and Rule-Based Algorithm
  • Apr 1, 2022
  • IEEE Transactions on Vehicular Technology
  • Can Xu + 4 more

In order to solve the manual labelling, long-tail effect and driving conservatism of the existing decision-making algorithm. This paper proposed an integrated decision-making framework (IDF) for highway autonomous vehicles. Firstly, states of the highway traffic are extracted by the velocity, time headway (TH) and the probabilistic lane distribution of the surrounding vehicles. With the extracted traffic state, the reinforcement learning (RL) is adopted to learn the optimal state-action pair for specific scenario. Analogously, by mapping millions of traffic scenarios, huge amounts of state-action pairs can be stored in the experience pool. Then the imitation learning (IL) is further employed to memorize the experience pool by deep neural networks. The learning result shows that the accuracy of the decision network can reach 94.17%. Besides, for some imperfect decisions of the network, the rule-based method is taken to rectify by judging the long-term reward. Finally, the IDF is simulated in G25 highway and has promising results, which can always drive the vehicle to the state with high efficiency while ensuring safety.

  • Book Chapter
  • Cite Count Icon 24
  • 10.1007/978-3-540-39917-9_11
Graph Kernels and Gaussian Processes for Relational Reinforcement Learning
  • Jan 1, 2003
  • Thomas Gärtner + 2 more

Relational reinforcement learning is a Q-learning technique for relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. In this case, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be not only very reliable, but it also has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the quality of state-action pairs. In order to employ Gaussian processes in a relational setting we use graph kernels as the covariance function between state-action pairs. Experiments conducted in the blocks world show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalisation algorithm for relational reinforcement learning.

  • Research Article
  • Cite Count Icon 72
  • 10.1007/s10994-006-8258-y
Graph kernels and Gaussian processes for relational reinforcement learning
  • May 8, 2006
  • Machine Learning
  • Kurt Driessens + 2 more

RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.neunet.2025.107286
Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient.
  • Jul 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Man Shu + 4 more

Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient.

  • Research Article
  • Cite Count Icon 2
  • 10.1109/tase.2022.3231386
How Useful is Learning in Mitigating Mismatch Between Digital Twins and Physical Systems?
  • Jan 1, 2024
  • IEEE Transactions on Automation Science and Engineering
  • Constantin Cronrath + 1 more

In the control of complex systems, we observe two diametrical trends: model-based control derived from digital twins, and model-free control through AI. There are also attempts to bridge the gap between the two by incorporating learning-based AI algorithms into digital twins to mitigate mismatches between the digital twin model and the physical system. One of the most straightforward approaches to this is direct input adaptation. In this paper, we ask whether it is useful to employ a generic learning algorithm in such a setting, and our conclusion is “not very”. We denote an algorithm to be more useful than another algorithm based on three aspects: 1) it requires fewer data samples to reach a desired minimal performance, 2) it achieves better performance for a reasonable number of data samples, and 3) it accumulates less regret. In our evaluation, we randomly sample problems from an industrially relevant geometry assurance context and measure the aforementioned performance indicators of 16 different algorithms. Our conclusion is that blackbox optimization algorithms, designed to leverage specific properties of the problem, generally perform better than generic learning algorithms, once again finding that “there is no free lunch”. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —Digital twins have the potential to improve productivity and quality in complex systems such as manufacturing systems. Their impact on system performance hinges on the accuracy of their digital models around the system’s operating points. Difficult to measure phenomena, such as wear and tear of equipment, however, may cause a mismatch between the digital twin model and the physical system. In this paper, we formalize this problem and compare 16 potential solution strategies under practical aspects. We argue that readily available off-the-shelf blackbox optimization algorithms may prove more useful for this problem, than more recent learning-based approaches. Specifically, gradient-based algorithms will perform best in systems with high-dimensional, continuous, and non-linear performance functions – even in the presence of white measurement noise.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/apcase.2015.13
Cost-Sensitive Learning for Imbalanced Bad Debt Datasets in Healthcare Industry
  • Jul 1, 2015
  • Donghui Shi + 2 more

The research using computational intelligence methods to improve bad debt recovery is imperative due to the rapid increase in the cost of healthcare in the U.S. This study explores effectiveness of using cost-sensitive learning methods to classify the unknown cases in imbalanced bad debt datasets and compares the results with those of two other methods: undersampling and oversampling, often used in processing imbalanced datasets. The study also analyzes the function of a semi-supervised learning algorithm in different circumstances. The results show that although the predictive accuracy rates with oversampling in balanced testing datasets is the best, it is unpractical due to the existence of imbalanced classes in real healthcare situations. The models constructed by undersampling have high classification accuracy rates of the minority class in imbalanced datasets, but they tend to make the overall classification accuracy rates of the majority class worse. The results show that cost-sensitive learning methods can improve the classification accuracy rates of the minority class in imbalanced datasets while achieving considerably good overall classification accuracy rates and classification accuracy rates of majority class. The results and analysis in this study show that cost-sensitive learning methods provide a potentially viable approach to classify the unknown cases in imbalanced bad debt datasets. At last, more practical predictive results are obtained by using the models to predict the unlabeled cases. Although oversampling and the cost-sensitive learning methods with the semi-supervised learning can improve the overall and majority class classification accuracy rates, the minority class classification accuracy rates are still relatively low. The semi-supervised learning algorithms need to be improved to adapt to the imbalanced bad debt datasets.

  • Research Article
  • 10.32877/bt.v8i2.2908
Sentiment Analysis of YouTube Comments on Free Lunch Program Using Machine Learning
  • Dec 10, 2025
  • bit-Tech
  • Bagus Satrio Pringgodani + 1 more

In the digital era, social media has become a primary platform for the public to express opinions, including reactions to governmental initiatives such as Indonesia's "Free Lunch" program. This study aims to systematically analyze public sentiment toward the program by leveraging YouTube comment data, providing a data-driven perspective on public perception. Comment data were automatically retrieved using the YouTube Data API v3 and underwent comprehensive text preprocessing, including data cleaning, case folding, normalization, stopword removal, and stemming. The preprocessed text data were classified into positive, negative, and neutral sentiments using two machine learning algorithms: K-Nearest Neighbor (KNN) and Naïve Bayes. Algorithm performance was systematically evaluated using a confusion matrix and standard classification metrics such as accuracy, precision, recall, and F1-score. Experimental results demonstrated that the Naïve Bayes classifier achieved higher precision (66%), recall (66%), and accuracy (66%), outperforming KNN in classifying sentiments within imbalanced datasets. Conversely, KNN showed more stable yet lower accuracy (39%) performance when sentiment distribution was relatively balanced. This study highlights the importance of thorough preprocessing and careful algorithm selection to improve sentiment classification accuracy from informal, user-generated content, especially within the Indonesian language context. The findings provide critical insights for policymakers, emphasizing the value of machine learning as a robust, empirical approach to evaluating public opinion.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.geodrs.2024.e00821
Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions
  • Jun 15, 2024
  • Geoderma Regional
  • Fereshteh Mirzaei + 4 more

Soil textural class modeling using digital soil mapping approaches: Effect of resampling strategies on imbalanced dataset predictions

  • Research Article
  • Cite Count Icon 8
  • 10.1177/10732748231167958
Differentiation of Bone Metastasis in Elderly Patients With LungAdenocarcinoma Using Multiple Machine Learning Algorithms
  • Jan 1, 2023
  • Cancer Control : Journal of the Moffitt Cancer Center
  • Cheng-Mao Zhou + 3 more

ObjectiveWe tested the performance of general machine learning and joint machinelearning algorithms in the classification of bone metastasis, in patientswith lung adenocarcinoma.MethodsWe used R version 3.5.3 for statistical analysis of the general information,and Python to construct machine learning models.ResultsWe first used the average classifiers of the 4 machine learning algorithms torank the features and the results showed that race, sex, whether they hadsurgery and marriage were the first 4 factors affecting bone metastasis.Machine learning results in the training group: for area under the curve(AUC), except for RF and LR, the AUC values of all machine learningclassifiers were greater than .8, but the joint algorithm did not improvethe AUC for any single machine learning algorithm. Among the results relatedto accuracy and precision, the accuracy of other machine learningclassifiers except the RF algorithm was higher than 70%, and only theprecision of the LGBM algorithm was higher than 70%. Machine learningresults in the test group: Similarly, for areas under the curve (AUC),except RF and LR, the AUC values for all machine learning classifiers weregreater than .8, but the joint algorithm did not improve the AUC value forany single machine learning algorithm. For accuracy, except for the RFalgorithm, the accuracy of other machine learning classifiers was higherthan 70%. The highest precision for the LGBM algorithm was .675.ConclusionThe results of this concept verification study show that machine learningalgorithm classifiers can distinguish the bone metastasis of patients withlung cancer. This will provide a new research idea for the future use ofnon-invasive technology to identify bone metastasis in lungcancer. However,more prospective multicenter cohort studies are needed.

  • Research Article
  • Cite Count Icon 1
  • 10.11834/jig.230028
Survey of imitation learning: tradition and new advances
  • Jan 1, 2023
  • Journal of Image and Graphics
  • Chao Zhang + 5 more

模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏人工智能(artificial intelligence,AI)、机器人控制和自动驾驶等领域发挥了重要作用。本文围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行深入探讨,介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。;Imitation learning(IL) is focused on the integration of reinforcement learning and supervised learning through observing demonstrations and learning expert strategies. The additional information related imitation learning can be used to optimize and implement its strategy, which can provide the possibility to alleviate low efficiency of sample problem. In recent years, imitation learning has become a popular framework for solving reinforcement learning problems, and a variety of algorithms and techniques have emerged to improve the performance of learning procedure. Combined with the latest research in the field of image processing, imitation learning has played an important role in such domains like game artificial intelligence(AI), robot control, autonomous driving. Traditional imitation learning methods are mainly composed of behavioral cloning(BC), inverse reinforcement learning(IRL), and adversarial imitation learning(AIL). Thanks to the computing ability and upstream graphics and image tasks(such as object recognition and scene understanding), imitation learning methods can be used to integrate a variety of technologies-emerged for complex tasks. We summarize and analyze imitation learning further, which is composed of imitation learning from observation(ILfO) and cross-domain imitation learning(CDIL). The ILfO can be used to optimize the requirements for expert demonstration, and information-observable can be learnt only without specific action information from experts. This setting makes imitation learning algorithms more practical, and it can be applied to real-life scenes. To alter the environment transition dynamics modeling, ILfO algorithms can be divided into two categories:model-based and model-free. For model-based methods, due to path-constructed of the model in the process of interaction between the agent and the environment, it can be assorted into forward dynamic model and inverse dynamic model further. The other related model-free methods are mainly composed of adversarial-based and function-rewarded engineering. Cross-domain imitation learning are mainly focused on the status of different domains for agents and experts, such as multiple Markov decision processes. Current CDIL research are mainly focused on the domain differences of three aspects of discrepancy in relevant to:transition dynamics, morphological, and view point. The technical solutions to CDIL problems can be mainly divided into such methods like:direct, mapping, adversarial, and optimal transport. The application of imitation learning is mainly on such aspects like game AI, robot control, and automatic driving. The recognition and perception capabilities of intelligent agents are optimized further in image processing, such as object detection, video understanding, video classification, and video recognition. Our critical analysis can be focused on the annual development of imitation learning from the five aspects:behavioral cloning, inverse reinforcement learning, adversarial imitation learning, imitation learning from observation, and cross-domain imitation learning.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icicip47338.2019.9012185
Comparison of Control Methods Based on Imitation Learning for Autonomous Driving
  • Dec 1, 2019
  • Yinfeng Gao + 7 more

Recently, some learning-based methods such as reinforcement learning and imitation learning have been used to address the control problem for autonomous driving. Note that reinforcement learning has strong reliance on the simulation environment and requires a handcraft design of the reward function. Considering different factors in autonomous driving, a general evaluation method is still being explored. The purpose of imitation learning is to learn the control policy through human demonstrations. It is meaningful to compare the control performances of current main imitation learning methods based on the provided dataset. In this paper, we compare three typical imitation learning algorithms: Behavior cloning, Dataset Aggregation (DAgger) and Information maximizing Generative Adversarial Imitation Learning (InfoGAIL) in the The Open Racing Car Simulator (TORCS) and Car Learning to Act (CARLA) simulators, respectively. The performance of algorithms is evaluated on lane-keeping task in racing and urban environment. The experiment results show DAgger performs best in simple lane keeping problem, and InfoGAIL has the unique advantage of distinguishing different driving styles from expert demonstrations.

  • Research Article
  • Cite Count Icon 5
  • 10.1609/aaai.v38i14.29470
DiffAIL: Diffusion Adversarial Imitation Learning
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Bingzheng Wang + 4 more

Imitation learning aims to solve the problem of defining reward functions in real-world decision-making tasks. The current popular approach is the Adversarial Imitation Learning (AIL) framework, which matches expert state-action occupancy measures to obtain a surrogate reward for forward reinforcement learning. However, the traditional discriminator is a simple binary classifier and doesn't learn an accurate distribution, which may result in failing to identify expert-level state-action pairs induced by the policy interacting with the environment. To address this issue, we propose a method named diffusion adversarial imitation learning (DiffAIL), which introduces the diffusion model into the AIL framework. Specifically, DiffAIL models the state-action pairs as unconditional diffusion models and uses diffusion loss as part of the discriminator's learning objective, which enables the discriminator to capture better expert demonstrations and improve generalization. Experimentally, the results show that our method achieves state-of-the-art performance and significantly surpasses expert demonstration on two benchmark tasks, including the standard state-action setting and state-only settings.

  • Book Chapter
  • 10.1007/978-981-19-6613-2_213
Air Combat Game Based on Behavior Repetition
  • Jan 1, 2023
  • Zhang Haoran + 3 more

Air combat decision-making is a process in which the both sides choose certain strategies to maximize the probability of winning according to the current air combat situation. Imitation learning is a way for an agent to make intelligent decisions by imitating expert examples. In this paper, aircraft confrontational strategies are trained by behavior repetition, which is a kind of imitative learning. Aiming at the aircraft model with three degrees of freedom, seven basic maneuvers were designed as alternative strategies by taking 1V1 and 2V2 air combat confrontations as research objects. The victory samples were obtained by Monte Carlo simulation, and state-action pairs were extracted from the victory samples. On this basis, neural network training was carried out, and the model that could be used for air combat decision-making was obtained under the condition of 1V1. In 2V2 condition, we focus on finding the winning strategy through the cooperation between agents. The validity of the decision model is proved by simulation.KeywordsImitation learningAir combat simulationCooperative strategy

  • Research Article
  • 10.3390/math12182930
On Convergence Rate of MRetrace
  • Sep 20, 2024
  • Mathematics
  • Xingguo Chen + 4 more

Off-policy is a key setting for reinforcement learning algorithms. In recent years, the stability of off-policy learning for value-based reinforcement learning has been guaranteed even when combined with linear function approximation and bootstrapping. Convergence rate analysis is currently a hot topic. However, the convergence rates of learning algorithms vary, and analyzing the reasons behind this remains an open problem. In this paper, we propose an essentially simplified version of a convergence rate to generate general off-policy temporal difference learning algorithms. We emphasize that the primary determinant influencing convergence rate is the minimum eigenvalue of the key matrix. Furthermore, we conduct a comparative analysis of the influencing factor across various off-policy learning algorithms in diverse numerical scenarios. The experimental findings validate the proposed determinant, which serves as a benchmark for the design of more efficient learning algorithms.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.