RL Algorithm Research Articles

Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large Language Models (LLMs) using Reinforcement Learning with Human Feedback (RLHF). Recognizing the challenges in collecting sufficient toxic samples from social media platforms for building a balanced dataset, our methodology involves sentence-level text data augmentation through paraphrasing existing samples using optimized generative LLMs. Leveraging generative LLM, we utilize the Proximal Policy Optimizer (PPO) as the RL algorithm to fine-tune the model further and align it with human feedback. In other words, we start by fine-tuning a LLM using an instruction dataset, specifically tailored for the task of paraphrasing while maintaining semantic consistency. Next, we apply PPO and a reward function, to further fine-tune (optimize) the instruction-tuned LLM. This RL process guides the model in generating toxic responses. We utilize the Google Perspective API as a toxicity evaluator to assess generated responses and assign rewards/penalties accordingly. This approach guides LLMs through PPO and the reward function, transforming minority class samples into augmented versions. The primary goal of our methodology is to create a balanced and diverse dataset to enhance the accuracy and performance of classifiers in identifying instances from the minority class. Utilizing two publicly available toxic datasets, we compared various techniques with our proposed method for generating toxic samples, demonstrating that our approach outperforms all others in producing a higher number of toxic samples. Starting with an initial 16,225 toxic prompts, our method successfully generated 122,951 toxic samples with a toxicity score exceeding 30%. Subsequently, we developed various classifiers using the generated balanced datasets and applied a cost-sensitive learning approach to the original imbalanced dataset. The findings highlight the superior performance of classifiers trained on data generated using our proposed method. These results highlight the importance of employing RL and a data-agnostic model as a reward mechanism for augmenting toxic data, thereby enhancing the robustness of toxicity detection models.

Read full abstract

Dynamic pricing is essential for airline revenue management, requiring quick adaptation to fluctuating market environments and complex customer behaviors. This study addresses the Multi-Flight Dynamic Pricing (MFDP) problem, which presents unique challenges due to interdependent demand between multiple flights and high dimensionality. Traditional studies often assume that the demand function modeling customer behavior is either known in advance or follows a predefined structure, failing to capture the dynamic nature of pricing decisions. To fill this gap, we develop deep reinforcement learning (DRL) algorithms—Deep Q-Network (DQN), Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO). By formulating the MFDP problem as a Markov Decision Process (MDP), we design an innovative utility function for the Multinomial Logit (MNL) model that captures realistic features of the airline market, such as competition from high-speed rail, the effect of reference fares, and travel time. We compare the performance of our DRL algorithms with traditional algorithms, including Dynamic Programming (DP), Price Pooling (PP), Inventory Pooling (IP), and Inventory and Price Pooling (IPP). Our experiments demonstrate that DRL algorithms alleviate the curse of dimensionality faced by traditional algorithms, expedite the learning process, and deliver satisfactory performance without relying on predefined demand functions. Among these algorithms, TRPO shows superior performance, achieving 99% of the theoretical optimal revenue, proving its adaptability and stability in dynamic pricing applications. We also highlight the importance of considering the null price in the action space of MFDP problems. The larger the market scale, the more pronounced the effect of the null price in accelerating RL algorithm convergence, leading to more efficient computational resource utilization.

Read full abstract

RL Algorithm Research Articles

Related Topics

Articles published on RL Algorithm

AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection

Implementation of Hybrid Adaptive Learning Algorithm for Task Offloading

Learning to construct a solution for UAV path planning problem with positioning error correction

Fish-inspired tracking of underwater turbulent plumes

An Alternative Reinforcement Learning (ARL) control strategy for data center air-cooled HVAC systems

Adaptive Optimization of Hyper-Parameters for Robotic Manipulation through Evolutionary Reinforcement Learning

A Time-Aggregated Model-Free RL Algorithm for Optimal Containment Control of MASs

Reinforcement learning for Multi-Flight Dynamic Pricing

Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy.

A Multi-Agent RL Algorithm for Dynamic Task Offloading in D2D-MEC Network with Energy Harvesting.

Integral reinforcement learning-based angular acceleration autopilot for high dynamic flight vehicles

Transfer Reinforcement Learning for Combinatorial Optimization Problems

An Empirical Study of Different Reinforcement Learning Algorithms for Resource Allocation in Cloud Computing

APPA-3D: an autonomous 3D path planning algorithm for UAVs in unknown complex environments

On principle of optimality for safety-constrained Markov Decision Process and p-Safe Reinforcement Learning ⋆

Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach

A new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning

GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving

Interpersonal trust modelling through multi-agent Reinforcement Learning

Goal-Conditioned Reinforcement Learning With Disentanglement-Based Reachability Planning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

RL Algorithm Research Articles

Related Topics

Articles published on RL Algorithm

AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection

Implementation of Hybrid Adaptive Learning Algorithm for Task Offloading

Learning to construct a solution for UAV path planning problem with positioning error correction

Fish-inspired tracking of underwater turbulent plumes

An Alternative Reinforcement Learning (ARL) control strategy for data center air-cooled HVAC systems

Adaptive Optimization of Hyper-Parameters for Robotic Manipulation through Evolutionary Reinforcement Learning

A Time-Aggregated Model-Free RL Algorithm for Optimal Containment Control of MASs

Reinforcement learning for Multi-Flight Dynamic Pricing

Event-Triggered Guarantee Cost Control for Partially Unknown Stochastic Systems via Explorized Integral Reinforcement Learning Strategy.

A Multi-Agent RL Algorithm for Dynamic Task Offloading in D2D-MEC Network with Energy Harvesting.

Integral reinforcement learning-based angular acceleration autopilot for high dynamic flight vehicles

Transfer Reinforcement Learning for Combinatorial Optimization Problems

An Empirical Study of Different Reinforcement Learning Algorithms for Resource Allocation in Cloud Computing

APPA-3D: an autonomous 3D path planning algorithm for UAVs in unknown complex environments

On principle of optimality for safety-constrained Markov Decision Process and p-Safe Reinforcement Learning ⋆

Optimization of a Redox-Flow Battery Simulation Model Based on a Deep Reinforcement Learning Approach

A new optimal adaptive backstepping control approach for nonlinear systems under deception attacks via reinforcement learning

GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving

Interpersonal trust modelling through multi-agent Reinforcement Learning

Goal-Conditioned Reinforcement Learning With Disentanglement-Based Reachability Planning