Reinforcement Learning with Human Feedback: A CartPole Case Study

Priya N. Parkhi Priya N. Parkhi

doi:10.52783/jes.1363

Abstract

This research delves into the integration of human feedback within reinforcement learning (RL) algorithms, with a specific focus on the CartPole environment as a testbed. We present RLHFAgent, a revolutionary RL agent devised to capitalize on human guidance during training for the purpose of expediting the learning process. Through the acquisition of feedback from a human operator, RLHFAgent adapts its policy in a more efficient manner, resulting in enhanced performance when it comes to balancing the pole. Our approach involves the training of a neural network model that approximates the policy function, mapping observations to actions, and subsequently updating this model based on human feedback. By means of a series of experiments, we showcase the efficacy of RLHFAgent in learning the art of balancing the pole, as demonstrated by the consistent rise in episodic rewards and the decrease in episodic loss over the course of training episodes. These findings indicate that the incorporation of human intuition into RL algorithms can augment their ability to adapt and expedite the learning process in intricate environments. In essence, this study contributes to the ongoing endeavours aimed at bridging the gap between RL algorithms and human expertise, thereby paving the way for more efficient and effective learning strategies in both simulated and real-world scenarios.

Full Text