Active preference-based Gaussian process regression for reward learning and optimization

Erdem Bıyık,Nicolas Huynh,Mykel J Kochenderfer,Dorsa Sadigh

doi:10.1177/02783649231208729

Abstract

Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for humans. A popular solution is to learn reward functions using expert demonstrations. This approach, however, is fraught with many challenges. Some methods require heavily structured models, for example, reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that may necessitate tremendous amounts of data. Moreover, it is difficult for humans to provide demonstrations on robots with high degrees of freedom, or even quantifying reward values for given trajectories. To address these challenges, we present a preference-based learning approach, where human feedback is in the form of comparisons between trajectories. We do not assume highly constrained structures on the reward function. Instead, we employ a Gaussian process to model the reward function and propose a mathematical formulation to actively fit the model using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. We further analyze our algorithm in comparison to several baselines on reward optimization, where the goal is to find the optimal robot trajectory in a data-efficient way instead of learning the reward function for every possible trajectory. Our results in three different simulation experiments and a user study show our approach can efficiently learn expressive reward functions for robotic tasks, and outperform the baselines in both reward learning and reward optimization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Active preference-based Gaussian process regression for reward learning and optimization

Abstract

Talk to us

Similar Papers

More From: The International Journal of Robotics Research

Lead the way for us

Journal: The International Journal of Robotics Research	Publication Date: Nov 7, 2023
Citations: 3

Similar Papers

Active Preference-Based Gaussian Process Regression for Reward Learning
Erdem Biyik ... Dorsa Sadigh
-
Erdem Biyik, et. al.Erdem Biyik ... Dorsa Sadigh
12 Jul 2020
12 Jul 2020

What Is It You Really Want of Me? Generalized Reward Learning with Biased Beliefs about Domain Dynamics
Ze Gong ... Yu Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Ze Gong, et. al.Ze Gong ... Yu Zhang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Optimal Rewards for Cooperative Agents
Bingyao Liu ... Satinder Singh
IEEE Transactions on Autonomous Mental Development | VOL. 6
Bingyao Liu, et. al.Bingyao Liu ... Satinder Singh
01 Dec 2014
IEEE Transactions on Autonomous Mental Development | VOL. 6

Weak Human Preference Supervision for Deep Reinforcement Learning
Zehong Cao ... Chin-Teng Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32
Zehong Cao, et. al.Zehong Cao ... Chin-Teng Lin
01 Dec 2021
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Active preference-based Gaussian process regression for reward learning and optimization

Abstract

Talk to us

Similar Papers

More From: The International Journal of Robotics Research