Learning Acceleration Method for Reinforcement Learning Agents by Knowledge Selection Based on the State Value and the Number of State Observation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Transfer learning is the one of methods to improve the efficiency of training for reinforcement learning (RL) agents. That is the method of transferring knowledge which is acquired in the past tasks or given by experts to similar tasks. However, if the RL agents don't retain knowledge that is effective for improving learning speed or quality, there is a problem in that it causes negative transfer, which reduces learning efficiency. This study proposes a method for deleting knowledge which are less than the number of state observations through trial and error by RL agents. Our proposed method was applied to path planning for a two‐wheeled mobile robot to verify its effectiveness, and it was confirmed that the proposed method reduced the amount of knowledge by 98% and the learning time by 32%.

Similar Papers
  • Conference Article
  • Cite Count Icon 4
  • 10.1109/aeeca49918.2020.9213574
A Spectrum Handoff Method Based on Reinforcement and Transfer Learning
  • Aug 1, 2020
  • Jiaxing Zhao + 3 more

This paper designs a spectrum handoff method based on reinforcement and transfer learning in a cognitive radio environment. In the context of secondary users adopting reinforcement learning to form a spectrum handoff strategy, transfer learning is used to increase the convergence speed of reinforcement learning for new users. First, the original secondary user completes reinforcement learning in a radio environment. Then, the original secondary users are considered as an expert user, and the Q table obtained through reinforcement learning is transferred to the newly arrived secondary users. Finally, the new users complete their own reinforcement learning based on the Q table. Through simulation experiments, comparing the reinforcement learning convergence process of new secondary users with and without transfer learning, it is found that transfer learning can significantly improve the convergence rate of new users.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.apenergy.2024.124179
Optimal operational planning of a bio-fuelled cogeneration plant: Integration of sparse nonlinear dynamics identification and deep reinforcement learning
  • Aug 21, 2024
  • Applied Energy
  • Seyed Mohammad Asadzadeh + 1 more

This paper presents a novel data-driven approach for short-term operational planning of a cogeneration plant. The proposed methodology utilizes sparse identification of nonlinear dynamics (SINDy) to extract a dynamic model of heat generation from operational data. This model is then employed to simulate the plant dynamics during the training of a reinforcement learning (RL) agent, enabling online stochastic optimization of the production plan in real-time. The incorporation of SINDy enhances the accuracy of capturing the plant's nonlinear dynamics and significantly improves the computational speed of plant simulations, enabling efficient RL agent training within a reasonable timeframe. The performance of operational planning with the RL agent is compared to that of dynamic programming, a widely used method in the literature. The evaluation metric encompasses energy efficiency, unmet demands, and wasted heat. The comparison investigates the effectiveness of RL and dynamic programming under various scenarios with different qualities of energy demand forecasts. The RL agent exhibits robustness and notably improves the operational planning performance, particularly when faced with uncertain energy demands in the environment. Furthermore, the findings show that the RL agent, trained on a school building data, could successfully perform planning tasks for a hotel building, indicating the transferability of learned planning knowledge across different cogeneration use cases.

  • Conference Article
  • Cite Count Icon 4
  • 10.23919/chicc.2017.8028754
Transfer learning via linear multi-variable mapping under reinforcement learning framework
  • Jul 1, 2017
  • Qiao Cheng + 2 more

Though popular in many agent learning tasks, reinforcement learning still faces problems, such as long learning time in complex environment. Transfer learning could shorten the learning time and improve the performance in reinforcement learning by reusing the knowledge acquired from different but related source task. Due to the difference in state space and/or action space of the target and source task, transfer via inter-task mapping is a popular method. The design of the inter-task mapping is very critical to this transfer learning method. In this paper, we propose a linear multi-variable mapping (LMVM) for the transfer learning to make a better use of the knowledge learned from the source task. Unlike the inter-task mapping used before, the LMVM is not a one-to-one mapping but a one-to-many mapping, which is based on the idea that the element in target task is related with several similar elements from source task. We test transfer learning via our new mapping on the Keepaway platform. The experimental results show that our method could make the reinforcement learning agents learn much faster than those without transfer and those transfer with simpler mappings.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/itsc.2010.5624977
Arterial traffic control using reinforcement learning agents and information from adjacent intersections in the state and reward structure
  • Sep 1, 2010
  • Juan C Medina + 2 more

An application that uses reinforcement learning (RL) agents for traffic control along an arterial under high traffic volumes is presented. RL agents were trained using Q learning and a modified version of the state representation that included information on the occupancy of the links from neighboring intersections. The proposed structure also includes a reward that considers potential blockage from downstream intersections (due to saturated conditions), as well as pressure to coordinate the signal response with the future arrival of traffic from upstream intersections. Experiments using microscopic simulation software were conducted for an arterial with 5 intersections under high conflicting volumes, and results were compared with the best settings of coordinated pre-timed phasing. Data showed lower delays and less number of stops with RL agents, as well as a more balanced distribution of the delay among all vehicles in the system. Evidence of coordinated-like behavior was found as the number of stops to traverse the 5 intersections was on average lower than 1.5, and also since the distribution of green times from all intersections was very similar. As traffic approached to capacity, however, delays with the pre-timed phasing were lower than with RL agents, but the agents produced lower maximum delay times and lower maximum number of stops per vehicle. Future research will analyze variable coefficients in the state and reward structures for the system to better cope with a wide variety of traffic volumes, including transitions from oversaturation to undersaturation and vice versa.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/act11050140
Body Calibration: Automatic Inter-Task Mapping between Multi-Legged Robots with Different Embodiments in Transfer Reinforcement Learning
  • May 21, 2022
  • Actuators
  • Satoru Ikeda + 3 more

Machine learning algorithms are effective in realizing the programming of robots that behave autonomously for various tasks. For example, reinforcement learning (RL) does not require supervision or data sets; the RL agent explores solutions by itself. However, RL requires a long learning time, particularly for actual robot learning situations. Transfer learning (TL) in RL has been proposed to address this limitation. TL realizes fast adaptation and decreases the problem-solving time by utilizing the knowledge of the policy, value function, and Q-function from RL. Taylor proposed TL using inter-task mapping that defines the correspondence between the state and action between the source and target domains. Inter-task mapping is defined based on human intuition and experience; therefore, the effect of TL may not be obtained. The difference in robot shapes for TL is similar to the cognition in the modification of human body composition, and automatic inter-task mapping can be performed by referring to the body representation that is assumed to be stored in the human brain. In this paper, body calibration is proposed, which refers to the physical expression in the human brain. It realizes automatic inter-task mapping by acquiring data modeled on a body diagram that illustrates human body composition and posture. The proposed method is evaluated in a TL situation from a computer simulation of RL to actual robot control with a multi-legged robot.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1186/s12868-016-0302-7
‘Proactive’ use of cue-context congruence for building reinforcement learning’s reward function
  • Oct 28, 2016
  • BMC Neuroscience
  • Judit Zsuga + 6 more

BackgroundReinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent’s knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent’s control either using, or not using a model.ResultsIn the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively.ConclusionsBased on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

  • Conference Article
  • Cite Count Icon 34
  • 10.5555/2343576.2343631
Reinforcement learning transfer via sparse coding
  • Jun 4, 2012
  • Haitham Bou Ammar + 4 more

Although reinforcement learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (TrLSPI and TrFQI) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of environmental samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples.

  • Conference Article
  • 10.1109/indin51773.2022.9976094
Curriculum Learning in Peristaltic Sortation Machine
  • Jul 25, 2022
  • Mohammed Sharafath Abdul Hameed + 3 more

This paper presents a novel approach to train a Reinforcement Learning (RL) agent faster for transportation of parcels in a Peristaltic Sortation Machine (PSM) using curriculum learning (CL). The PSM was developed as a means to transport parcels using an actuator and a flexible film where a RL agent is trained to control the actuator. In a previous paper, training of the actuator was done on a Discrete Element Method (DEM) simulation environment of the PSM developed using an open-source DEM library called LIGGGHTS, which reduced the training time of the transportation task compared to the real machine. But it still took days to train the agent. The objective of this paper is to reduce the training time to hours. To overcome this problem, we developed a faster but lower fidelity python simulation environment (PSE) capable of simulating the transportation task of PSM. And we used it with a curriculum learning approach to accelerate training the agent in the transportation process. The RL agent is trained in two steps in the PSE: 1. with a fixed set of goal positions, 2. with randomized goal positions. Additionally, we also use Gradient Monitoring (GM), a gradient regularization method, which provides additional trust region constraints in the policy updates of the RL agent when switching between tasks. The agent so trained is then deployed and tested in the DEM environment where the agent has not been trained before. The results obtained show that the RL agent trained using CL and PSE successfully completes the tasks in the DEM environment without any loss in performance, while using only a fraction of the training time (1.87%) per episode. This will allow for faster prototyping of algorithms to be tested on the PSM in future.

  • Research Article
  • Cite Count Icon 2
  • 10.3906/elk-2008-94
Relational-grid-world: a novel relational reasoning environment and an agent model for relational information extraction
  • Mar 30, 2021
  • TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
  • Faruk Küçüksubaşi + 1 more

Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generallyhave uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms ofgeneralizability and interpretability using symbolic artificial intelligence (AI) tools such as logic programming. Inthis study, we present a model-free RL architecture that is supported with explicit relational representations of theenvironmental objects. For the first time, we use the PrediNet network architecture in a dynamic decision-making problemrather than image-based tasks, and multi-head dot-product attention network (MHDPA) as a baseline for performancecomparisons. We tested two networks in two environments -i.e., the baseline box-world environment and our novelenvironment, relational-grid-world (RGW). With the procedurally generated RGW environment, which is complex interms of visual perceptions and combinatorial selections, it is easy to measure the relational representation performance ofthe RL agents. The experiments were carried out using different configurations of the environment so that the presentedmodule and the environment were compared with the baselines. We reached similar policy optimization performanceresults with the PrediNet architecture and MHDPA. Additionally, we achieved to extract the propositional representationexplicitly -which makes the agent's statistical policy logic more interpretable and tractable. This flexibility in the agent'spolicy provides convenience for designing non-task-specific agent architectures. The main contributions of this studyare two-fold -an RL agent that can explicitly perform relational reasoning, and a new environment that measures therelational reasoning capabilities of RL agents.

  • Dissertation
  • Cite Count Icon 2
  • 10.25534/tuprints-00011372
Reinforcement Learning with Sparse and Multiple Rewards
  • Feb 13, 2020
  • Simone Parisi

Reinforcement Learning with Sparse and Multiple Rewards

  • Conference Article
  • Cite Count Icon 10
  • 10.5555/2936924.2937000
Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems
  • May 9, 2016
  • Kleanthis Malialis + 2 more

Real-world congestion problems (e.g. traffic congestion) are typically very complex and large-scale. Multiagent reinforcement learning (MARL) is a promising candidate for dealing with this emerging complexity by providing an autonomous and distributed solution to these problems. However, there are three limiting factors that affect the deployability of MARL approaches to congestion problems. These are learning time, scalability and decentralised coordination i.e. no communication between the learning agents. In this paper we introduce Resource Abstraction, an approach that addresses these challenges by allocating the available resources into abstract groups. This abstraction creates new reward functions that provide a more informative signal to the learning agents and aid the coordination amongst them. Experimental work is conducted on two benchmark domains from the literature, an abstract congestion problem and a realistic traffic congestion problem. The current state-of-the-art for solving multiagent congestion problems is a form of reward shaping called difference rewards. We show that the system using Resource Abstraction significantly improves the learning speed and scalability, and achieves the highest possible or near-highest joint performance/social welfare for both congestion problems in large-scale scenarios involving up to 1000 reinforcement learning agents.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ichms56717.2022.9980765
Bootstrapping Human-Autonomy Collaborations by using Brain-Computer Interface of SSVEP for Multi-Agent Deep Reinforcement Learning
  • Nov 17, 2022
  • Joshua Ho + 8 more

Human-Autonomy Teaming (HAT) has become one of the emerging AI trends due to the advances in sophisticated machine design that allows closer cooperation with humans while performing moral, reasonable, and applicable tasks as humans' most exemplary assistants. Based on HAT's pursuing the collective goal and sharing the authority between humans and machines, our research aims at answering whether humans' brain-computer interface (BCI) helps achieve efficient collaborations of human with Reinforcement Learning (RL) agents. How can it efficiently facilitate human-in-the-loop guidance to bootstrap the training of the agents? This study proposes a BCI-based system that interacts with RL agents as a human-in-the-loop teaming integration. The neural responses elicited by the Steady-State Visual Evoked Potential in BCI facilitate the collaboration of learning agents with humans and accomplish this goal in a game simulation environment. The results of our proposed system, NeuroRL, show significant improvement by reducing the non-stationarity of exploitations and explorations in the RL agents. With BCI-assisted human-in-the-loop, the rewards can be optimized during the early investigations to achieve more efficient convergence in the training. The novel design proposed in this study can extend the development of the emerging HAT field and knowledge-based RL systems for various applications in dynamic environments.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-642-11876-0_31
Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates
  • Jan 1, 2010
  • Kouki Shimada + 2 more

Reinforcement learning applications to real robots in multi-agent dynamic environments are limited because of huge exploration space and enormously long learning time. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method that utilizes state value functions of macro actions to explore appropriate behavior efficiently in a multi-agent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas are as follows. First, the agent learns a few macro actions and the state value functions based on reinforcement learning beforehand. Second, an appropriate initial controller for learning cooperative behavior is generated based on the state value functions. The initial controller utilizes the state values of the macro actions so that the learner tends to select a good macro action and not select useless ones. By combination of the ideas and a two-layer hierarchical system, the proposed method shows better performance during the learning than conventional methods. This paper shows a case study of 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within shorter learning time.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-030-05363-5_17
Design of Transfer Reinforcement Learning Mechanisms for Autonomous Collision Avoidance
  • Jan 1, 2019
  • Xiongqing Liu + 1 more

It is often hard for a reinforcement learning (RL) agent to utilize previous experience to solve new similar but more complex tasks. In this research, we combine the transfer learning with reinforcement learning and investigate how the hyperparameters of both transfer learning and reinforcement learning impact the learning effectiveness and task performance in the context of autonomous robotic collision avoidance. A deep reinforcement learning algorithm was first implemented for a robot to learn, from its experience, how to avoid randomly generated single obstacles. After that the effect of transfer of previously learned experience was studied by introducing two important concepts, transfer belief—i.e., how much a robot should believe in its previous experience—and transfer period—i.e., how long the previous experience should be applied in the new context. The proposed approach has been tested for collision avoidance problems by altering transfer period. It is shown that transfer learnings on average had ~50% speed increase at ~30% competence levels, and there exists an optimal transfer period where the variance is the lowest and learning speed is the fastest.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.