Editor's evaluation: Neural population dynamics of computing with synaptic modulations

Gianluigi Mongillo

doi:10.7554/elife.83035.sa0

Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Methods Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract In addition to long-timescale rewiring, synapses in the brain are subject to significant modulation that occurs at faster timescales that endow the brain with additional means of processing information. Despite this, models of the brain like recurrent neural networks (RNNs) often have their weights frozen after training, relying on an internal state stored in neuronal activity to hold task-relevant information. In this work, we study the computational potential and resulting dynamics of a network that relies solely on synapse modulation during inference to process task-relevant information, the multi-plasticity network (MPN). Since the MPN has no recurrent connections, this allows us to study the computational capabilities and dynamical behavior contributed by synapses modulations alone. The generality of the MPN allows for our results to apply to synaptic modulation mechanisms ranging from short-term synaptic plasticity (STSP) to slower modulations such as spike-time dependent plasticity (STDP). We thoroughly examine the neural population dynamics of the MPN trained on integration-based tasks and compare it to known RNN dynamics, finding the two to have fundamentally different attractor structure. We find said differences in dynamics allow the MPN to outperform its RNN counterparts on several neuroscience-relevant tests. Training the MPN across a battery of neuroscience tasks, we find its computational capabilities in such settings is comparable to networks that compute with recurrent connections. Altogether, we believe this work demonstrates the computational possibilities of computing with synaptic modulations and highlights important motifs of these computations so that they can be identified in brain-like systems. Editor's evaluation The study shows that fast and transient modifications of the synaptic efficacies, alone, can support the storage and processing of information over time. Convincing evidence is provided by showing that feed-forward networks, when equipped with such short-term synaptic modulations, perform a wide variety of tasks at a performance level comparable with that of recurrent networks. The results of the study are valuable to both neuroscientists and researchers in machine learning. https://doi.org/10.7554/eLife.83035.sa0 Decision letter Reviews on Sciety eLife's review process Introduction The brain’s synapses constantly change in response to information under several distinct biological mechanisms (Love, 2003; Hebb, 2005; Bailey and Kandel, 1993; Markram et al., 1997; Bi and Poo, 1998; Stevens and Wang, 1995; Markram and Tsodyks, 1996). These changes can serve significantly different purposes and occur at drastically different timescales. Such mechanisms include synaptic rewiring, which modifies the topology of connections between neurons in our brain and can be as fast as minutes to hours. Rewiring is assumed to be the basis of long-term memory that can last a lifetime (Bailey and Kandel, 1993). At faster timescales, individual synapses can have their strength modified (Markram et al., 1997; Bi and Poo, 1998; Stevens and Wang, 1995; Markram and Tsodyks, 1996). These changes can occur over a spectrum of timescales and can be intrinsically transient (Stevens and Wang, 1995; Markram and Tsodyks, 1996). Though such mechanisms may not immediately lead to structural changes, they are thought to be vital to the brain’s function. For example, short-term synaptic plasticity (STSP) can affect synaptic strength on timescales less than a second, with such effects mainly presynaptic-dependent (Stevens and Wang, 1995; Tsodyks and Markram, 1997). At slower timescales, long-term potentiation (LTP) can have effects over minutes to hours or longer, with the early phase being dependent on local signals and the late phase including a more complex dependence on protein synthesis (Baltaci et al., 2019). Also on the slower end, spike-time-dependent plasticity (STDP) adjusts the strengths of connections based on the relative timing of pre- and postsynaptic spikes (Markram et al., 1997; Bi and Poo, 1998; McFarlan et al., 2023). In this work, we investigate a new type of artificial neural network (ANN) that uses biologically motivated synaptic modulations to process short-term sequential information. The multi-plasticity network (MPN) learns using two complementary plasticity mechanisms: (1) long-term synaptic rewiring via standard supervised ANN training and (2) simple synaptic modulations that operate at faster timescales. Unlike many other neural network models with synaptic dynamics (Tsodyks et al., 1998; Mongillo et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019), the MPN has no recurrent synaptic connections, and thus can only rely on modulations of synaptic strengths to pass short-term information across time. Although both recurrent connections and synaptic modulation are present in the brain, it can be difficult to isolate how each of these affects temporal computation. The MPN thus allows for an in-depth study of the computational power of synaptic modulation alone and how the dynamics behind said computations may differ from networks that rely on recurrence. Having established how modulations alone compute, we believe it will be easier to disentangle synaptic computations from brain-like networks that may compute using a combination of recurrent connections, synaptic dynamics, neuronal dynamics, etc. Biologically, the modulations in the MPN represent a general synapse-specific change of strength on shorter timescales than the structural changes, the latter of which are represented by weight adjustment via backpropagation. We separately consider two forms of modulation mechanisms, one of which is dependent on both the pre- and postsynaptic firing rates and a second that only depends on presynaptic rates. The first of these rules is primarily envisioned as coming from associative forms of plasticity that depend on both pre- and postsynaptic neuron activity (Markram et al., 1997; Bi and Poo, 1998; McFarlan et al., 2023). Meanwhile, the second type of modulation models presynaptic-dependent STSP (Mongillo et al., 2008; Zucker and Regehr, 2002). While both these mechanisms can arise from distinct biological mechanisms and can span timescales of many orders of magnitude, the MPN uses simplified dynamics to keep the effects of synaptic modulations and our subsequent results as general as possible. It is important to note that in the MPN, as in the brain, the mechanisms that represent synaptic modulations and rewiring are not independent of one another – changes in one affect the operation of the other and vice versa. To understand the role of synaptic modulations in computing and how they can change neuronal dynamics, throughout this work we contrast the MPN with recurrent neural networks (RNNs), whose synapses/weights remain fixed after a training period. RNNs store temporal, task-relevant information in transient internal neural activity using recurrent connections and have found widespread success in modeling parts of our brain (Cannon et al., 1983; Ben-Yishai et al., 1995; Seung, 1996; Zhang, 1996; Ermentrout, 1998; Stringer et al., 2002; Xie et al., 2002; Fuhs and Touretzky, 2006; Burak and Fiete, 2009). Although RNNs model the brain’s significant recurrent connections, the weights in these networks neglect the role transient synaptic dynamics can have in adjusting synaptic strengths and processing information. Considerable progress has been made in analyzing brain-like RNNs as population-level dynamical systems, a framework known as neural population dynamics (Vyas et al., 2020). Such studies have revealed a striking universality of the underlying computational scaffold across different types of RNNs and tasks (Maheswaranathan et al., 2019b). To elucidate how computation through synaptic modulations affect neural population behavior, we thoroughly characterize the MPN’s low-dimensional behavior in the neural population dynamics framework (Vyas et al., 2020). Using a novel approach of analyzing the synapse population behavior, we find the MPN computes using completely different dynamics than its RNN counterparts. We then explore the potential benefits behind its distinct dynamics on several neuroscience-relevant tasks. Contributions The primary contributions and findings of this work are as follows: We elucidate the neural population dynamics of the MPN trained on integration-based tasks and show it operates with qualitatively different dynamics and attractor structure than RNNs. We support this with analytical approximations of said dynamics. We show how the MPN’s synaptic modulations allow it to store and update information in its state space using a task-independent, single point-like attractor, with dynamics slower than task-relevant timescales. Despite its simple attractor structure, for integration-based tasks, we show the MPN performs at level comparable or exceeding RNNs on several neuroscience-relevant measures. The MPN is shown to have dynamics that make it a more effective reservoir, less susceptible to catastrophic forgetting, and more flexible to taking in new information than RNN counterparts. We show the MPN is capable of learning more complex tasks, including contextual integration, continuous integration, and 19 neuroscience tasks in the NeuroGym package (Molano-Mazon et al., 2022). For a subset of tasks, we elucidate the changes in dynamics that allow the network to solve them. Related work Networks with synaptic dynamics have been investigated previously (Tsodyks et al., 1998; Mongillo et al., 2008; Sugase-Miyamoto et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019; Hu et al., 2021; Tyulmankov et al., 2022; Tyulmankov et al., 2022; Rodriguez et al., 2022). As we mention above, many of these works investigate networks with both synaptic dynamics and recurrence (Tsodyks et al., 1998; Mongillo et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019), whereas here we are interested in investigating the computational capabilities and dynamical behavior of computing with synapse modulations alone. Unlike previous works that examine computation solely through synaptic changes, the MPN’s modulations occur at all times and do not require a special signal to activate their change (Sugase-Miyamoto et al., 2008). The networks examined in this work are most similar to the recently introduced ‘HebbFF’ (Tyulmankov et al., 2022) and ‘STPN’ (Rodriguez et al., 2022) that also examine computation through continuously updated synaptic modulations. Our work differs from these studies in that we focus on elucidating the neural population dynamics of such networks, contrasting them to known RNN dynamics, and show why this difference in dynamics may be beneficial in certain neuroscience-relevant settings. Additionally, the MPN uses a multiplicative modulation mechanism rather than the additive modulation of these two works, which in some settings we investigate yields significant performance differences. The exact form of the synaptic modulation updates were originally inspired by ‘fast weights’ used in machine learning for flexible learning (Ba et al., 2016). However, in the MPN, both plasticity rules apply to the same weights rather than different ones, making it more biologically realistic. This work largely focuses on understanding computation through a neural population dynamics-like analysis (Vyas et al., 2020). In particular, we focus on the dynamics of networks trained on integration-based tasks, that have previously been studied in RNNs (Maheswaranathan et al., 2019b; Maheswaranathan et al., 2019a; Maheswaranathan and Sussillo, 2020; Aitken et al., 2020). These studies have demonstrated a degree of universality of the underlying computational structure across different types of tasks and RNNs (Maheswaranathan et al., 2019b). Due to the MPN’s dynamic weights, its operation is fundamentally different than said recurrent networks. Setup Throughout this work, we primarily investigate the dynamics of the MPN on tasks that require an integration of information over time. To correctly respond to said task, the network is required to both store and update its internal state as well as compare several distinct items in its memory. All tasks in this work consist of a discrete sequence of vector inputs, xt for t=1,2,…,T. For the tasks we consider presently, at time T the network is queried by a ‘go signal’ for an output, for which the correct response can depend on information from the entire input sequence. Throughout this paper, we denote vectors using lowercase bold letters, matrices by uppercase bold letters, and scalars using standard (not-bold) letters. The input, hidden, and output layers of the networks we study have d, n, and N neurons, respectively. Multi-plasticity network The multi-plasticity network (MPN) is an artificial neural network consisting of input, hidden, and output layers of neurons. It is identical to a fully-connected, two-layer, feedforward network (Figure 1, middle), with one major exception: the weights connecting the input and hidden layer are modified by the time-dependent synapse modulation (SM) matrix, M (Figure 1, left). The expression for the hidden layer activity at time step t is (1) ht=tanh⁡((Mt−1⊙Winp)xt+Winpxt) where Winp is an n-by-d weight matrix representing the network’s synaptic strengths that is fixed after training, ‘⊙’ denotes element-wise multiplication of the two matrices (the Hadamard product), and the tanh⁡(⋅) is applied element-wise. For each synaptic weight in Winp, a corresponding element of Mt−1 multiplicatively modulates its strength. Note if Mt−1=0 the first term vanishes, so the Winp are unmodified and the network simply functions as a fully connected feedforward network. Figure 1 Download asset Open asset Two neural network computational mechanisms: synaptic modulations and recurrence. Throughout this figure, neurons are represented as white circles, the black lines between neurons represent regular feedforward weights that are modified during training through gradient descent/backpropagation. From bottom to top are the input, hidden, and output layers, respectively. (Middle) A two-layer, fully connected, feedforward neural network. (Left) Schematic of the MPN. Here, the pink and black lines (between the input and hidden layer) represent weights that are modified by both backpropagation (during training) and the synapse modulation matrix (during an input sequence), see Equation 1. (Right) Schematic of the Vanilla RNN. In addition to regular feedforward weights between layers, the RNN has (fully connected) weights between its hidden layer from one time step to the next, see Equation 3. What allows the MPN to store and manipulate information as the input sequence is passed to the network is how the SM matrix, Mt, changes over time. Throughout this work, we consider two distinct modulation update rules. The primary rule we investigate is dependent upon both the pre- and postsynaptic firing rates. An alternative update rule only depends upon the presynaptic firing rate. Respectively, the SM matrix updated for these two cases takes the form (Hebb, 2005; Ba et al., 2016; Tyulmankov et al., 2022), (2a) pre.&post.:Mt=λMt−1+ηhtxtT (2b) pre. only:Mt=λMt−1+η1xtT/n, where λ and η are parameters learned during training and 1 is the n-dimensional vector of all 1s. We allow for −∞<η<∞, so the size and sign of the modulations can be optimized during training. Additionally, 0<λ<1, so the SM matrix exponentially decays at each time step, asymptotically returning to its M=0 baseline. For both rules, we define M0=0 at the start of each input sequence. Since the SM matrix is updated and passed forward at each time step, we will often refer to Mt as the state of said networks. To distinguish networks with these two modulation rules, we will refer to networks with the presynaptic only rule as MPNpre, while we reserve MPN for networks with the pre- and postsynatpic update that we primarily investigate. For brevity, and since almost all results for the MPN generalize to the simplified update rule of the MPNpre, the main text will foremost focus on results for the MPN. Results for the MPNpre are discussed only briefly or given in the supplement. As mentioned in the introduction, from a biological perspective the MPN’s modulations represent a general associative plasticity such as STDP, whereas the presynaptic-dependent modulations of the MPNpre can represent STSP. The decay induced by λ represents the return to baseline of the aforementioned processes, which all occur at a relatively slow speed to their onset (Bertram et al., 1996; Zucker and Regehr, 2002). To ensure the eventual decay of such modulations, unless otherwise stated, throughout this work we further limit λ<λmax with λmax=0.95. Additionally, we observe no major performance or dynamics difference for positive or negative η, so we do not distinguish the two throughout this work (Methods). We emphasize that the modulation mechanisms of the MPN and MPNpre could represent biological processes that occur at significantly different timescales, so although we train them on identical tasks the tasks themselves are assumed to occur at timescales that match the modulation mechanism of the corresponding network. Note that the modulation mechanisms are not independent of weight adjustment from backpropagation. Since the SM matrix is active during training, the network’s weights that are being adjusted by backpropgation (see below) are experiencing modulations, and said modulations factor into how the weights are adjusted. Lastly, the output of the MPN and MPNpre at time T is determined by a fully-connected readout matrix, yT=WROhT, where WRO is an N-by-n weight matrix adjusted during training. Throughout this work, we will view said readout matrix as N distinct n-dimensional readout vectors, that is one for each output neuron. Recurrent neural networks As discussed in the introduction, throughout this work we will compare the learned dynamics and performance of the MPN to artificial RNNs. The hidden layer activity for the simplest recurrent neural network, the Vanilla RNN, is (3) ht=tanh⁡(Wrecht−1+Winpxt+b), with Wrec the recurrent weights, an n-by-n matrix that updates the hidden neurons from one time step to the next (Figure 1, right). We also consider a more sophisticated RNN structure, the gated recurrent unit (GRU), that has additional gates to more precisely control the recurrent update of its hidden neurons (see Methods 5.2). In both these RNNs, information is stored and updated via the hidden neuron activity, so we will often refer to ht as the RNNs’ hidden state or just its state. The output of the RNNs is determined through a trained readout matrix in the same manner as the MPN above, i.e. yT=WROhT. Training The weights of the MPN, MPNpre, and RNNs will be trained using gradient descent/backpropagation through time, specifically ADAM (Kingma and Ba, 2014). All network weights are subject to L1 regularization to encourage sparse solutions (Methods 5.2). Cross-entropy loss is used as a measure of performance during training. Gaussian noise is added to all inputs of the networks we investigate. Results Network dynamics on a simple integration task Simple integration task We begin our investigation of the MPN’s dynamics by training it on a simple N-class (Through most of this work, the number of neurons in the output layer of our networks will always be equal to the number of classes in the task, so we use N to denote both unless otherwise stated). integration task, inspired by previous works on RNN integration-dynamics (Maheswaranathan et al., 2019a; Aitken et al., 2020). In this task, the network will need to determine for which of the N classes the input sequence contains the most evidence (Figure 2a). Each stimulus input, xt, can correspond to a discrete unit of evidence for one of the N classes. We also allow inputs that are evidence for none of the classes. The final input, xT, will always be a special ‘go signal’ input that tells the network an output is expected. The network’s output should be an integration of evidence over the entire input sequence, with an output activity that is largest from the neuron that corresponds to the class with the maximal accumulated evidence. (We omit sequences with two or more classes tied for the most evidence. See Methods 5.1 for additional details). Prior to adding noise, each possible input, including the go signal, is mapped to a random binary vector (Figure 2b). We will also investigate the effect of inserting a delay period between the stimulus period and the go signal, during which no input is passed to the network, other than noise (Figure 2c). Figure 2 Download asset Open asset Schematic of simple integration task. (a) Example sequence of the two-class integration task where each box represents an input. Here and throughout this work, distinct classes are represented by different colors. In this case, red and blue. The red/blue boxes represent evidence for their respective classes, while the grey box represents an input that is evidence for neither class. At the end of the sequence is the ‘go signal’ that lets the network know an output is expected. The correct response for the sequence is the class with the most evidence; in the example shown, the red class. (b) Each possible input is mapped to a (normalized) random binary vector. (c) The integration task can be modified by the insertion of a ‘delay period’ between the stimulus period and the go signal. During the delay period, the network receives no input (other than noise). We find the MPN (and MPNpre) is capable of learning the above integration task to near perfect accuracy across a wide range of class counts, sequence lengths, and delay lengths. It is the goal of this section to illuminate the dynamics behind the trained MPN that allow it to solve such a task and compare them to more familiar RNN dynamics. Here, in the main text, we will explicitly explore the dynamics of a two-class integration task, generalizations to N>2 classes are straightforward and are discussed in the Methods 5.4. We will start by considering the simplest case of integration without a delay period, revisiting the effects of delay afterwards. Before we dive into the dynamics of the MPN, we give a quick recap of the known RNN dynamics on integration-based tasks. Review of RNN integration: attractor dynamics encodes accumulated evidence Several studies, both on natural and artificial neural networks, have discovered that networks with recurrent connections develop attractor dynamics to solve integration-based tasks (Maheswaranathan et al., 2019a; Maheswaranathan and Sussillo, 2020; Aitken et al., 2020). Here, we specifically review the behavior of artificial RNNs on the aforementioned N-class integration tasks that share many qualitative features with experimental observations of natural neural networks. Note also the structure/dimensionality of the dynamics can depend on correlations between the various classes (Aitken et al., 2020), in this work we only investigate the case where the various classes are uncorrelated. RNNs are capable of learning to solve the simple integration task at near-perfect accuracy and their dynamics are qualitatively the same across several architectures (Maheswaranathan et al., 2019b; Aitken et al., 2020). Discerning the network’s behavior by looking at individual hidden neuron activity can be difficult (Figure 3a), and so it is useful to turn to a population-level analysis of the dynamics. When the number of hidden neurons is much larger than number of integration classes (n≫N), the population activity of the trained RNN primarily exists in a low-dimensional subspace of approximate dimension N−1 (Aitken et al., 2020). This is due to recurrent dynamics that create a task-dependent attractor manifold of approximate dimension N−1, and the hidden activity often operates close to said attractor. (See Methods 5.4 for a more in-depth review of these results including how approximate dimensionality is determined). In the two-class case, the RNN will operate close to a finite length line attractor. The low-dimensionality of hidden activity allows for an intuitive visualization of the dynamics using a two-dimensional PCA projection (Figure 3b). From the sample trajectories, we see the network’s hidden activity starts slightly offset from the line attractor before quickly falling towards its center. As evidence for one class over the other builds, the hidden activity encodes accumulated evidence by moving along the one-dimensional attractor (Figure 3b). The two readout vectors are roughly aligned with the two ends of the line, so the further the final hidden activity, hT, is toward one side of the attractor, the higher that class’s corresponding output and thus the RNN correctly identifies the class with the most evidence. For later reference, we note that the hidden activity of the trained RNN is not highly dependent upon the input of the present time step (Figure 3c), but instead it is the change in the hidden activity from one time step to the next, ht-ht-1, that are highly input-dependent (Figure 3c, inset). For the Vanilla RNN (GRU), we find 0.53±0.01 (0.88±0.01) of the hidden activity variance to be explained by the accumulated evidence and only 0.19±0.01 (0.29±0.01) to be explained by the present input to the network (mean±s.e., Methods 5.4). Figure 3 with 3 supplements see all Download asset Open asset Two-class integration: comparison of multi-plasticity network and RNN dynamics. (a-c) Vanilla RNN hidden neuron dynamics, see Figure 3—figure supplement 1 for GRU. (a) Hidden layer neural activity, ht, for four sample neurons of the RNN as a function of sequence time (in units of sequence index). The shaded grey region represents the stimulus period during which information should be integrated across time and the thin purple-shaded region representing the response to the go signal. (b) Hidden neuron activity, collected over 1000 input sequences, projected into their top two PCA components, colored by relative accumulated evidence between red/blue classes at time t (Methods 5.5). Also shown are PCA projections of sample trajectories (thin lines, colored by class), the red/blue class readout vector (thick lines), and the initial state (black square). (c) Same as (b), with ht now colored by input at the present time step, xt (four possibilities, see inset). The inset shows the PCA projection of ht-ht-1 as a function of the present input, xt, with the dark lines showing the average for each of the four inputs. [d-f] MPN hidden neuron dynamics, see Figure 3—figure supplement 1 for MPNpre. (d) Same as (a). (e) Same as (b). (f) Same as (c), except for inset. The inset now shows the alignment of each input-cluster with the readout vectors (Methods 5.5). [g-h] MPN synaptic modulation dynamics. (g) Same as (b), but instead of hidden neuron activity, the PCA projection of the SM matrices, Mt, collected over 1000 input sequences. Final Mt are colored slightly darker for clarity. (h) Same as (g), with a different y-axis. The inset is the same as that shown in (b), but for Mt-Mt-1. MPN hidden activity encodes inputs, not so much accumulated evidence We now turn to analyzing the hidden activity of the trained MPNs in the same manner that was done for the RNNs. The MPN trained on a two-class integration task appears to have significantly more sporadic activity in the individual components of ht (Figure 3d). We again find the hidden neuron activity to be low-dimensional, with approximate dimension 2.07±0.12 (mean±s.e.), lending it to informative visualization using a PCA projection (Methods 5.4). Unlike the RNN, we observe the hidden neuron activity to be separated into several distinct clusters (Figure 3e). Exemplar input sequences cause ht to rapidly transition between said clusters. Coloring the ht by the sequence input at the present time step, we see the different inputs are what divide the hidden activity into distinct clusters, that we hence call input-clusters (Figure 3f). That is, the hidden neuron activity is largely dependent upon the most recent input to the network, rather than the accumulated evidence as we saw for the RNN. However, within each input-cluster, we also see a variation in ht from accumulated evidence (Figure 3e). For the MPN (MPNpre), we now find only 0.21±0.01 (0.16±0.05) of the hidden activity variance to be explained by accumulated evidence and 0.87±0.01 (0.80±0.03) to be explained by the present input to the network (mean±s.e., Methods 5.4). (The MPNpre dynamics are largely the same of what we discuss here, see Sec. 5.4.2 for further discussion). With the hidden neuron activity primarily dependent upon the current input to the network, one may wonder how the MPN ultimately outputs information dependent upon the entire sequence to solve the task. Like the other possible inputs to the network, the go signal has its own distinct input-cluster within which the hidden activities vary by accumulated evidence. Amongst all input-clusters, we find the readout vectors are highly aligned with the evidence variation within the go cluster (Figure 3f, inset). The readouts are th

Full Text