Decision letter: Neural learning rules for generating flexible predictions and computing the successor representation

Stefano Recanatesi,Srdjan Ostojic,Timothy E Behrens,Arthur Juliani

doi:10.7554/elife.80680.sa1

Abstract

Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Methods Appendix 1 Appendix 2 Appendix 3 Appendix 4 Appendix 5 Appendix 6 Appendix 7 Data availability References Decision letter Author response Article and author information Metrics Abstract The predictive nature of the hippocampus is thought to be useful for memory-guided cognitive behaviors. Inspired by the reinforcement learning literature, this notion has been formalized as a predictive map called the successor representation (SR). The SR captures a number of observations about hippocampal activity. However, the algorithm does not provide a neural mechanism for how such representations arise. Here, we show the dynamics of a recurrent neural network naturally calculate the SR when the synaptic weights match the transition probability matrix. Interestingly, the predictive horizon can be flexibly modulated simply by changing the network gain. We derive simple, biologically plausible learning rules to learn the SR in a recurrent network. We test our model with realistic inputs and match hippocampal data recorded during random foraging. Taken together, our results suggest that the SR is more accessible in neural circuits than previously thought and can support a broad range of cognitive functions. Editor's evaluation This important work provides compelling evidence for the biological plausibility of the Successor Representation (SR) algorithm. The SR is a leading computational hypothesis to explore whether neural representations are consistent with the hypothesis that the neural networks in specific brain areas perform predictive computations. Establishing a biologically plausible learning rule for SR representations to form is of high significance in the field of neuroscience. https://doi.org/10.7554/eLife.80680.sa0 Decision letter Reviews on Sciety eLife's review process eLife digest Memories are an important part of how we think, understand the world around us, and plan out future actions. In the brain, memories are thought to be stored in a region called the hippocampus. When memories are formed, neurons store events that occur around the same time together. This might explain why often, in the brains of animals, the activity associated with retrieving memories is not just a snapshot of what happened at a specific moment-- it can also include information about what the animal might experience next. This can have a clear utility if animals use memories to predict what they might experience next and plan out future actions. Mathematically, this notion of predictiveness can be summarized by an algorithm known as the successor representation. This algorithm describes what the activity of neurons in the hippocampus looks like when retrieving memories and making predictions based on them. However, even though the successor representation can computationally reproduce the activity seen in the hippocampus when it is making predictions, it is unclear what biological mechanisms underpin this computation in the brain. Fang et al. approached this problem by trying to build a model that could generate the same activity patterns computed by the successor representation using only biological mechanisms known to exist in the hippocampus. First, they used computational methods to design a network of neurons that had the biological properties of neural networks in the hippocampus. They then used the network to simulate neural activity. The results show that the activity of the network they designed was able to exactly match the successor representation. Additionally, the data resulting from the simulated activity in the network fitted experimental observations of hippocampal activity in Tufted Titmice. One advantage of the network designed by Fang et al. is that it can generate predictions in flexible ways,. That is, it canmake both short and long-term predictions from what an individual is experiencing at the moment. This flexibility means that the network can be used to simulate how the hippocampus learns in a variety of cognitive tasks. Additionally, the network is robust to different conditions. Given that the brain has to be able to store memories in many different situations, this is a promising indication that this network may be a reasonable model of how the brain learns. The results of Fang et al. lay the groundwork for connecting biological mechanisms in the hippocampus at the cellular level to cognitive effects, an essential step to understanding the hippocampus, as well as its role in health and disease. For instance, their network may provide a concrete approach to studying how disruptions to the ways neurons make and break connections can impair memory formation. More generally, better models of the biological mechanisms involved in making computations in the hippocampus can help scientists better understand and test out theories about how memories are formed and stored in the brain. Introduction To learn from the past, plan for the future, and form an understanding of our world, we require memories of personal experiences. These memories depend on the hippocampus for formation and recall (Scoville and Milner, 1957; Penfield and Milner, 1958; Corkin, 2002), but an algorithmic and mechanistic understanding of memory formation and retrieval in this region remains elusive. From a computational perspective, a key function of memory is to use past experiences to inform predictions of possible futures (Bubic et al., 2010; Wayne et al., 2018; Whittington et al., 2020; Momennejad, 2020). This suggests that hippocampal memory is stored in a way that is particularly suitable for forming predictions. Consistent with this hypothesis, experimental work has shown that, across species and tasks, hippocampal activity is predictive of the future experience of an animal (Skaggs and McNaughton, 1996; Lisman and Redish, 2009; Mehta et al., 1997; Payne et al., 2021; Muller and Kubie, 1989; Pfeiffer and Foster, 2013; Schapiro et al., 2016; Garvert et al., 2017). Furthermore, theoretical work has found that models endowed with predictive objectives tend to resemble hippocampal activity (Blum and Abbott, 1996; Mehta et al., 2000; Stachenfeld et al., 2017; Momennejad et al., 2017; Geerts et al., 2020; Recanatesi et al., 2021; Whittington et al., 2020; George et al., 2021). Thus, it is clear that predictive representations are an important aspect of hippocampal memory. Inspired by work in the reinforcement learning (RL) field, these observations have been formalized by describing hippocampal activity as a predictive map under the successor representation (SR) algorithm (Dayan, 1993; Gershman et al., 2012; Stachenfeld et al., 2017). Under this framework, an animal’s experience in the world is represented as a trajectory through some defined state space, and hippocampal activity predicts the future experience of an animal by integrating over the likely states that an animal will visit given its current state. This algorithm further explains how, in addition to episodic memory, the hippocampus may support relational reasoning and decision making (Recanatesi et al., 2021; Mattar and Daw, 2018), consistent with differences in hippocampal representations in different tasks (Markus et al., 1995; Jeffery, 2021). The SR framework captures many experimental observations of neural activity, leading to a proposed computational function for the hippocampus (Stachenfeld et al., 2017). While the SR algorithm convincingly argues for a computational function of the hippocampus, it is unclear what biological mechanisms might compute the SR in a neural circuit. Thus, several relevant questions remain that are difficult to probe with the current algorithm. What kind of neural architecture should one expect in a region that can support this computation? Are there distinct forms of plasticity and neuromodulation needed in this system? What is the structure of hippocampal inputs to be expected? A biologically plausible model can explore these questions and provide insight into both mechanism and function (Marr and Poggio, 1976; Frank, 2015; Love, 2021). In other systems, it has been possible to derive biological mechanisms with the goal of achieving a particular network function or property (Zeldenrust et al., 2021; Karimi et al., 2022; Pehlevan et al., 2017; Olshausen and Field, 1996; Burbank, 2015; Aitchison et al., 2021; Földiák, 1990; Tyulmankov et al., 2022). Key to many of these models is the constraint that learning rules at any given neuron can only use information local to that neuron. A promising direction towards such a neural model of the SR is to use the dynamics of a recurrent neural network (RNN) to perform SR computations (Vértes and Sahani, 2019; Russek et al., 2017). An RNN model is particularly attractive as the hippocampus is highly recurrent, and its connectivity patterns are thought to support associative learning and recall (Gardner-Medwin, 1976; McNaughton and Morris, 1987; Marr et al., 1991; Liu et al., 2012). However, an RNN model of the SR has not been tied to neural learning rules that support its operation and allow for testing of specific hypotheses. Here, we show that an RNN with local learning rules and an adaptive learning rate exactly calculates the SR at steady state. We test our model with realistic inputs and make comparisons to neural data. In addition, we compare our results to the standard SR algorithm with respect to the speed of learning and the learned representations in cases where multiple solutions exist. Our work provides a mechanistic account for an algorithm that has been frequently connected to the hippocampus, but could only be interpreted at an algorithmic level. This network-level perspective allows us to make specific predictions about hippocampal mechanisms and activity. Results The successor representation The SR algorithm described in Stachenfeld et al., 2017 first discretizes the environment explored by an animal (whether a physical or abstract space) into a set of n states that the animal transitions through over time (Figure 1A). The animal’s behavior can then be thought of as a Markov chain with a corresponding transition probability matrix TN×N (Figure 1B). T gives the probability that the animal transitions to a state s′ from the state s in one time step: Tj⁢i=P⁢(s′=i|s=j). The SR matrix is defined as (1) M=∑t=0∞γtTt=(I−γT)−1 Here, γ∈(0,1) is a temporal discount factor. Mj⁢i can be seen as a measure of the occcupancy of state i over time if the animal starts at state j, with γ controlling how much to discount time steps in the future (Figure 1C). The SR of state j is the jth row of M and represents the states that an animal is likely to transition to from state j. Stachenfeld et al., 2017 demonstrate that, if one assumes each state drives a single neuron, the SR of j resembles the population activity of hippocampal neurons when the animal is at state j (Figure 1D). They also show that the ith column of M resembles the place field (activity as a function of state) of a hippocampal neuron representing state i (Figure 1E). In addition, the ith column of M shows which states are likely to lead to state i. Figure 1 Download asset Open asset The successor representation and an analogous recurrent network model. (A) The behavior of an animal running down a linear track can be described as a transition between discrete states where the states encode spatial location. (B) By counting the transitions between different states, the behavior of an animal can be summarized in a transition probability matrix T. (C) The successor representation matrix is defined as M=∑t=0∞γt⁢Tt. Here, M is shown for γ=0.6. Dashed boxes indicate the slices of M shown in (D) and (E). (D) The fourth row of the M matrix describes the activity of each state-encoding neuron when the animal is at the fourth state. (E) The fourth column of the M matrix describes the place field of the neuron encoding the fourth state. (F) Recurrent network model of the SR (RNN-S). The current state of the animal is one-hot encoded by a layer of input neurons. Inputs connect one-to-one onto RNN neurons with synaptic connectivity matrix J=T⊺. The activity of the RNN neurons are represented by x. SR activity is read out from one-to-one connections from the RNN neurons to the output neurons. The example here shows inputs and outputs when the animal is at state 4. (G) Feedforward neural network model of the SR (FF-TD). The M matrix is encoded in the weights from the input neurons to the output layer neurons, where the SR activity is read out. (H) Diagram of the terms used for the RNN-S learning rule. Terms in red are used for potentiation while terms in blue are used for normalization (Equation 4). (I) As in (H) but for the feedforward-TD model (Equation 11). To reduce the notation indicating time steps, we use ′ in place of (t) and no added notation for (t-1). Recurrent neural network computes SR at steady state We begin by drawing connections between the SR algorithm (Stachenfeld et al., 2017) and an analogous neural network architecture. The input to the network encodes the current state of the animal and is represented by a layer of input neurons (Figure 1FG). These neurons feed into the rest of the network that computes the SR (Figure 1FG). The SR is then read out by a layer of output neurons so that downstream systems receive a prediction of the upcoming states (Figure 1FG). We will first model the inputs ϕ as one-hot encodings of the current state of the animal (Figure 1FG). That is, each input neuron represents a unique state, and input neurons are one-to-one connected to the hidden neurons. We first consider an architecture in which a recurrent neural network (RNN) is used to compute the SR (Figure 1F). Let us assume that the T matrix is encoded in the synaptic weights of the RNN. In this case, the steady state activity of the network in response to input ϕ retrieves a row of the SR matrix, M⊺ϕ (Figure 1F, subsection 4.14). Intuitively, this is because each recurrent iteration of the RNN progresses the prediction by one transition. In other words, the tth recurrent iteration raises T to the tth power as in Equation 1. To formally derive this result, we first start by defining the dynamics of our RNN with classical rate network equations (Amarimber, 1972). At time t, the firing rate x(t) of N neurons given each neurons’ input ϕ(t) follows the discrete-time dynamics (assuming a step size Δ⁢t=1) (2) Δx=−x(t)+γJf(x(t))+ϕ(t) Here, γ scales the recurrent activity and is a constant factor for all neurons. The synaptic weight matrix J∈ℛN×N is defined such that Ji⁢j is the synaptic weight from neuron j to neuron i. Notably, this notation is transposed from what is used in RL literature, where conventions have the first index as the starting state. Generally, f is some nonlinear function in Equation 2. For now, we will consider f to be the identity function, rendering this equation linear. Under this assumption, we can solve for the steady state activity xss as (3) xss=(I−γJ)−1ϕ Equivalence between Equation 1 and Equation 3 is clearly reached when J=T⊺ (Russek et al., 2017; Vértes and Sahani, 2019). Thus, if the network can learn T in its synaptic weight matrix, it will exactly compute the SR. Here, the factor γ represents the gain of the neurons in the network, which is factored out of the synaptic strengths characterized by J. Thus, γ is an independently adjustable factor that can flexibly control the strength of the recurrent dynamics (see Sompolinsky et al., 1988). A benefit of this flexibility is that the system can retrieve successor representations of varying predictive strengths by modulating the gain factor γ. In this way, the predictive horizon can be dynamically controlled without any additional learning required. We will refer to the γ used during learning of the SR as the baseline γ, or γB. We next consider what is needed in a learning rule such that J approximates T⊺. In order to learn a transition probability matrix, a learning rule must associate states that occur sequentially and normalize the synaptic weights into a valid probability distribution. We derive a learning rule that addresses both requirements (Figure 1H, Appendix 2), (4) ΔJij=ηxi(t)xj(t−1)−ηxj(t−1)∑kJikxk(t−1), where η is the learning rate. The first term in Equation 4 is a temporally asymmetric potentiation term that counts states that occur in sequence. This is similar to spike-timing dependent plasticity, or STDP (Bi and Poo, 1998; Skaggs and McNaughton, 1996; Abbott and Blum, 1996). The second term in Equation 4 is a form of synaptic depotentiation. Depotentiation has been hypothesized to be broadly useful for stabilizing patterns and sequence learning (Földiák, 1990; Fiete et al., 2010), and similar inhibitory effects are known to be elements of hippocampal learning (Kullmann and Lamsa, 2007; Lamsa et al., 2007). In our model, the depotentiation term in Equation 4 imposes local anti-Hebbian learning at each neuron– that is, each column of J is normalized independently. This normalizes the observed transitions from each state by the number of visits to that state, such that transition statistics are correctly captured. We note, however, that other ways of column-normalizing the synaptic weight matrix may give similar representations (Appendix 7). Crucially, the update rule (Equation 4) uses information local to each neuron (Figure 1H), an important aspect of biologically plausible learning rules. We show that, in the asymptotic limit, the update rule extracts information about the inputs ϕ and learns T exactly despite having access only to neural activity x (Appendix 3). We will refer to an RNN using Equation 4 as the RNN-Successor, or RNN-S. Combined with recurrent dynamics (Equation 3), RNN-S computes the SR exactly (Figure 1H). As an alternative to the RNN-S model, we consider the conditions necessary for a feedforward neural network to compute the SR. Under this architecture, the M matrix must be encoded in the weights from the input neurons to the hidden layer neurons (Figure 1G). This can be achieved by updating the synaptic weights with a temporal difference (TD) learning rule, the standard update used to learn the SR in the usual algorithm. Although the TD update learns the SR, it requires information about multiple input layer neurons to make updates for the synapse from input neuron j to output neuron i (Figure 1I). Thus, it is useful to explore other possible mechanisms that are simpler to compute locally. We refer to the model described in Figure 1I as the feedforward-TD (FF-TD) model. The FF-TD model implements the canonical SR algorithm. Evaluating SR learning by biologically plausible learning rules To evaluate the effectiveness of the RNN-S learning rule, we tested its accuracy in learning the SR matrix for random walks. Specifically, we simulated random walks with different transition biases in a 1D circular track environment (Figure 2A). The RNN-S can learn the SR for these random walks (Figure 2B). Figure 2 with 1 supplement see all Download asset Open asset Comparing the effects of an adaptive learning rate and plasticity kernels in RNN-S. (A) Sample one-minute segments from random walks on a 1 meter circular track. Possible actions in this 1D walk are to move forward, stay in one place, or move backward. Action probabilities are uniform (top), biased to move forward (middle), or biased to stay in one place (bottom). (B) M matrices estimated by the RNN-S model in the full random walks from (A).(C) The proposed learning rate normalization. The learning rate ηj for synapses out of neuron j changes as a function of its activity xj and recency bias λ. Dotted lines are at [0.0,0.5,1.0]. (D) The mean row sum of T over time computed by the RNN-S with an adaptive learning rate (blue) or the RNN-S with static learning rates (orange). Darker lines indicate larger static learning rates. Lines show the average over 5 simulations from walks with a forward bias, and shading shows 95% confidence interval. A correctly normalized T matrix should have a row sum of 1.0. (E) As in (D), but for the mean absolute error in estimating T. (F) As in (E), but for mean absolute error in estimating the real M, and with performance of FF-TD included, with darker lines indicating slower learning rates for FF-TD. (G) Lap-based activity map of a neuron from RNN-S with static learning rate η=10-1.5. The neuron encodes the state at 45cm on a circular track. The simulated agent is moving according to forward-biased transition statistics. (H) As in (G), but for RNN-S with adaptive learning rate. (I) The learning rate over time for the neuron in (G) (orange) and the neuron in (H) (blue). (J) Mean-squared error (MSE) at the end of meta-learning for different plasticity kernels. The pre→post (K+) and post→pre (K-) sides of each kernel were modeled by A⁢e-1τ. Heatmap indices indicate the values τ s were fixed to. Here, K+ is always a positive function (i.e., A was positive), because performance was uniformly poor when K+ was negative. K- could be either positive (left, “Post → Pre Potentiation") or negative (right, “Post → Pre Depression"). Regions where the learned value for A was negligibly small were set to high errors. Errors are max-clipped at 0.03 for visualization purposes. 40 initializations were used for each K+ and K- pairing, and the heatmap shows the minimum error acheived over all intializations. (K) Plasticity kernels chosen from the areas of lowest error in the grid search from (J). Left is post → pre potentiation. Right is post → pre depression. Kernels are normalized by the maximum, and dotted lines are at one second intervals. Because equivalence is only reached in the asymptotic limit of learning (i.e. Δ⁢J→0), our RNN-S model learns the SR slowly. In contrast, animals are thought to be able to learn the structure of an environment quickly (Zhang et al., 2021), and neural representations in an environment can also develop quickly (Monaco et al., 2014; Sheffield and Dombeck, 2015; Bittner et al., 2015). To remedy this, we introduce a dynamic learning rate that allows for faster normalization of the synaptic weight matrix, similar to the formula for calculating a moving average (Appendix 4). For each neuron, suppose that a trace n of its recent activity is maintained with some time constant λ∈(0,1), (5) n(t)=∑t′<tλ(t−t′)x(t′) If the learning rate of the outgoing synapses from each neuron j is inversely proportional to nj(η=1nj(t)), the update equation quickly normalizes the synapses to maintain a valid transition probability matrix (Appendix 4). Modulating synaptic learning rates as a function of neural activity is consistent with experimental observations of metaplasticity (Abraham and Bear, 1996; Abraham, 2008; Hulme et al., 2014). We refer to this as an adaptive learning rate and contrast it with the previous static learning rate. We consider the setting where λ=1, so the learning rate monotonically decreases over time (Figure 2C). In general, however, the learning rate could increase or decrease over time if λ<1 (Figure 2C), and n could be reset, allowing for rapid learning. Our learning rule with the adaptive learning rate is the same as in Equation 4, with the exception that η=min⁢(1nj⁢(t),1.0) for synapses J*j. This learning rule still relies only on information local to the neuron as in Figure 1H. The RNN-S with an adaptive learning rate normalizes the synapses more quickly than a network with a static learning rate (Figure 2D, Figure 2—figure supplement 1) and learns T faster (Figure 2E, Figure 2—figure supplement 1). The RNN-S with a static learning rate exhibits more of a tradeoff between normalizing synapses quickly (Figure 2D, Figure 2—figure supplement 1A) and learning M accurately (Figure 2E, Figure 2—figure supplement 1). However, both versions of the RNN-S estimate M more quickly than the FF-TD model (Figure 2F, Figure 2—figure supplement 1). Place fields can form quickly, but over time the place fields may skew if transition statistics are consistently biased (Stachenfeld et al., 2017; Monaco et al., 2014; Sheffield and Dombeck, 2015; Bittner et al., 2015). The adaptive learning rate recapitulates both of these effects, which are thought to be caused by slow and fast learning processes, respectively. A low learning rate can capture the biasing of place fields, which develops over many repeated experiences. This is seen in the RNN-S with a static learning rate (Figure 2G). However, a high learning rate is needed for hippocampal place cells to develop sizeable place fields in one-shot. Both these effects of slow and fast learning can be seen in the neural activity of an example RNN-S neuron with an adaptive learning rate (Figure 2H). After the first lap, a sizeable field is induced in a one-shot manner, centered at the cell’s preferred location. In subsequent laps, the place field slowly distorts to reflect the bias of the transition statistics (Figure 2H). The model is able to capture these learning effects because the adaptive learning rate transitions between high and low learning rates, unlike the static version (Figure 2I). Thus far, we have assumed that the RNN-S learning rule uses pre→post activity over two neighboring time steps (Equation 4). A more realistic framing is that a convolution with a plasticity kernel determines the weight change at any synapse. We tested how this affects our model and what range of plasticity kernels best supports the estimation of the SR. We do this by replacing the pre→post potentiation term in Equation 4 with a convolution: (6) ΔJij=xi(t)∑t′=−∞tK+(t−t′)xj(t′)+xj(t)∑t′=−∞tK−(t−t′)xi(t′)−ηxj(t−1)∑kJikxk(t−1) In the above equation, the full kernel K is split into a pre→post kernel (K+) and a post→pre kernel (K-). K+ and K- are parameterized as independent exponential functions, A⁢e-t/τ. To systematically explore the space of plasticity kernels that can be used to learn the SR, we performed a grid search over the sign and the time constants of the pre→post and post→pre sides of the plasticity kernels. For each fixed sign and time constant, we used an evolutionary algorithm to learn the remaining parameters that determine the plasticity kernel. We find that plasticity kernels that are STDP-like are more effective than others, although plasticity kernels with slight post→pre potentiation work as well (Figure 2J). The network is sensitive to the time constant and tends to find solutions for time constants around a few hundred milliseconds (Figure 2JK). Our robustness analysis indicates the timescale of a plasticity rule in such a circuit may be longer than expected by standard STDP, but within the timescale of changes in behavioral states. We note that this also contrasts with behavioral timescale plasticity (Bittner et al., 2015), which integrates over a window that is several seconds long. Finally, we see that even plasticity kernels with slightly different time constants may give results with minimal error from the SR matrix, even if they do not estimate the SR exactly (Figure 2J). This suggests that, although other plasticity rules could be used to model long-horizon predictions, the SR is a reasonable –although not strictly unique– model to describe this class of predictive representations. RNN-S can compute the SR with arbitrary γR under a stable regime of γB We next investigate how robust the RNN-S model is to the value of γ. Typically, for purposes of fitting neural data or for RL simulations, γ will take on values as high as 0.9 (Stachenfeld et al., 2017; Barreto et al., 2017). However, previous work that used RNN models reported that recurrent dynamics become unstable if the gain γ exceeds a critical value (Sompolinsky et al., 1988; Zhang et al., 2021). This could be problematic as we show analytically that the RNN-S update rule is effective only when the network dynamics are stable and do not have non-normal amplification (Appendix 2). If these conditions are not satisfied during learning, the update rule no longer optimizes for fitting the SR and the learned weight matrix will be incorrect. We first test how the value of γB, the gain of the network during learning, affects the RNN-S dynamics. The dynamics become unstable when γB exceeds 0.6 (Figure 3—figure supplement 1). Specifically, the eigenvalues of the synaptic weight matrix exceed the critical threshold for learning when γB>0.6 (Figure 3A, ‘Linear’). As expected from our analytical results, the stability of the network is tied to the network’s ability to estimate M. RNN-S cannot estimate M well when γB>0.6 (Figure 3B, ‘Linear’). We explored two strategies to enable RNN-S to learn at high γ. Figure 3 with 1 supplement see all Download asset Open asset RNN-S requires a stable choice of γB during learning, and can compute SR with any γR (A) Maximum real eigenvalue of the J matrix at the end of random walks under different γB. The network dynamics were either fully linear (solid) or had a tanh nonlinearity (dashed). Red line indicates the transition into an unstable regime. 45 simulations were run for each γB, line indicates mean, and shading shows 95% confidence interval. (B) MAE of M matrices learned by RNN-S with different γB. RNN-S was simulated with linear dynamics (solid line) or with a tanh nonlinearity added to the recurrent dynamics (dashed line). Test datasets used various biases in action probability selection. (C) M matrix learned by RNN-S with tanh nonlinearity added in the recurrent dynamics. A forward-biased walk on a circular track was simulated, and γB=0.8. (D) The true M matrix of the walk used to generate (C). (E) Simulated population activity over the first ten

Full Text