Abstract

Event Abstract Back to Event Risk-minimization through Q-learning of the learning rate Kerstin Preuschoff1* and Peter Bossaerts2 1 University of Zurich, Social and Neural Systems Lab, Switzerland 2 California Institute of Technology, United States In reinforcement learning, the learning rate is a fundamental parameter that determines how past prediction errors affect future predictions. Traditionally, the learning rate is kept constant, either tuned to the learning problem at hand, or - in behavioral and imaging experiments - fit to the observed data. In stable environments this approach works well, yet the thus found learning rate will vary depending on the underlying probabilities and number of trials played. However, prediction performance drops in uncertain environments, when keeping the learning rate constant. Adaptable learning rates may help to learn faster and thus improve predictions over time. We have previously proposed a way to adapt learning rates in risky environments with no changes to the underlying stochastic parameters (zero volatility). In such an environment, the learning rate adapts as a function of risk, decreasing with increasing risk. The optimal learning rate depends on how much correlation (covariance) there is between optimal predictions and the immediately preceding prediction error. While this approach works well in theory, a history of optimal predictors is usually unavailable. Here we propose to adapt the learning rate by minimizing the overall prediction risk (i.e., by maximizing the prediction precision). The overall prediction risk can be considered a value function that represents the discounted sum of future prediction errors given a specific learning rate. We can learn the best learning rate by minimizing this value function. This implicitly incorporates additional information about underlying processes and thus accelerates learning. To achieve this, we borrow ideas from Q-learning to translate risk-sensitive reward-learning into learning an action-value function that minimizes prediction risk using past reward prediction errors. The ensuing optimization problem is a function of a decision-makers risk-sensitivity and converges under the same conditions as standard Q-learning algorithms. Using the inverse prediction risk as a value and the reward-learning rate as an action, we show that the resulting policy adjusts the (reward-) learning rate as a function of the decision-makers risk preferences. This learning rate is a function of both the risk and volatility of the environment. Learning rates decrease with increasing risk and increase with increasing volatility as shown by behavioral data (Behrens et al, 2007) and predicted by previous models (Preuschoff & Bossaerts, 2007; Behrens et al, 2007). Evidence is discussed that suggests that the dopaminergic system, insula and ACC in the (human and nonhuman) primate brain support a risk-minimizing algorithm in addition to risk-sensitive reward learning. Together with the previous model this can be used to incorporate the trade-off between expected reward and risk by adjusting the learning rate in reward-based learning. The model can be generalized to include risk-neutral as well as risk-seeking decision makers. It essentially extracts information about the origin of uncertainty (e.g., risk vs. volatility) to decide on how much weight to put on more recent prediction errors compared to those that occurred many time steps ago. Conference: Computational and Systems Neuroscience 2010, Salt Lake City, UT, United States, 25 Feb - 2 Mar, 2010. Presentation Type: Poster Presentation Topic: Poster session II Citation: Preuschoff K and Bossaerts P (2010). Risk-minimization through Q-learning of the learning rate. Front. Neurosci. Conference Abstract: Computational and Systems Neuroscience 2010. doi: 10.3389/conf.fnins.2010.03.00235 Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters. The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated. Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed. For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions. Received: 04 Mar 2010; Published Online: 04 Mar 2010. * Correspondence: Kerstin Preuschoff, University of Zurich, Social and Neural Systems Lab, Zurich, Switzerland, kerstin.preuschoff@unige.ch Login Required This action requires you to be registered with Frontiers and logged in. To register or login click here. Abstract Info Abstract The Authors in Frontiers Kerstin Preuschoff Peter Bossaerts Google Kerstin Preuschoff Peter Bossaerts Google Scholar Kerstin Preuschoff Peter Bossaerts PubMed Kerstin Preuschoff Peter Bossaerts Related Article in Frontiers Google Scholar PubMed Abstract Close Back to top Javascript is disabled. Please enable Javascript in your browser settings in order to see all the content on this page.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.