Anatomical-connectivity-guided functional connectivity reveals task-relevant pathways during proactive task-switching via recurrent graph neural networks.
Anatomical-connectivity-guided functional connectivity reveals task-relevant pathways during proactive task-switching via recurrent graph neural networks.
- Research Article
27
- 10.1109/access.2019.2942853
- Jan 1, 2019
- IEEE Access
An information dissemination network (i.e., a cascade) with a dynamic graph structure is formed when a novel idea or message spreads from person to person. Predicting the growth of cascades is one of the fundamental problems in social network analysis. Existing deep learning models for cascade prediction are primarily based on recurrent neural networks and representation on random walks or propagation paths. However, these models are not sufficient for learning the deep spatial and temporal features of an entire cascade. Therefore, a new model, called Cascade2vec, is proposed to learn the dynamic graph representation of cascades based on graph recurrent neural networks. To learn more effective graph-level representation of cascades, the current graph neural networks are improved by designing a graph residual block, which shares attention weights between nodes, and by transforming features through perception layers. Furthermore, the proposed graph neural network is integrated into a recurrent neural network to learn the temporal features between graphs. With this method, both the spatial and temporal characteristics of cascades are learned in Cascade2vec. The experimental results show that our method significantly reduces the mean squared logarithmic error and median squared logarithmic error by 16.1% and 12%, respectively, in the cascade prediction at one hour in the Microblog network dataset compared with strong baselines.
- Peer Review Report
- 10.7554/elife.83035.sa0
- Jan 8, 2023
Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Methods Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract In addition to long-timescale rewiring, synapses in the brain are subject to significant modulation that occurs at faster timescales that endow the brain with additional means of processing information. Despite this, models of the brain like recurrent neural networks (RNNs) often have their weights frozen after training, relying on an internal state stored in neuronal activity to hold task-relevant information. In this work, we study the computational potential and resulting dynamics of a network that relies solely on synapse modulation during inference to process task-relevant information, the multi-plasticity network (MPN). Since the MPN has no recurrent connections, this allows us to study the computational capabilities and dynamical behavior contributed by synapses modulations alone. The generality of the MPN allows for our results to apply to synaptic modulation mechanisms ranging from short-term synaptic plasticity (STSP) to slower modulations such as spike-time dependent plasticity (STDP). We thoroughly examine the neural population dynamics of the MPN trained on integration-based tasks and compare it to known RNN dynamics, finding the two to have fundamentally different attractor structure. We find said differences in dynamics allow the MPN to outperform its RNN counterparts on several neuroscience-relevant tests. Training the MPN across a battery of neuroscience tasks, we find its computational capabilities in such settings is comparable to networks that compute with recurrent connections. Altogether, we believe this work demonstrates the computational possibilities of computing with synaptic modulations and highlights important motifs of these computations so that they can be identified in brain-like systems. Editor's evaluation The study shows that fast and transient modifications of the synaptic efficacies, alone, can support the storage and processing of information over time. Convincing evidence is provided by showing that feed-forward networks, when equipped with such short-term synaptic modulations, perform a wide variety of tasks at a performance level comparable with that of recurrent networks. The results of the study are valuable to both neuroscientists and researchers in machine learning. https://doi.org/10.7554/eLife.83035.sa0 Decision letter Reviews on Sciety eLife's review process Introduction The brain’s synapses constantly change in response to information under several distinct biological mechanisms (Love, 2003; Hebb, 2005; Bailey and Kandel, 1993; Markram et al., 1997; Bi and Poo, 1998; Stevens and Wang, 1995; Markram and Tsodyks, 1996). These changes can serve significantly different purposes and occur at drastically different timescales. Such mechanisms include synaptic rewiring, which modifies the topology of connections between neurons in our brain and can be as fast as minutes to hours. Rewiring is assumed to be the basis of long-term memory that can last a lifetime (Bailey and Kandel, 1993). At faster timescales, individual synapses can have their strength modified (Markram et al., 1997; Bi and Poo, 1998; Stevens and Wang, 1995; Markram and Tsodyks, 1996). These changes can occur over a spectrum of timescales and can be intrinsically transient (Stevens and Wang, 1995; Markram and Tsodyks, 1996). Though such mechanisms may not immediately lead to structural changes, they are thought to be vital to the brain’s function. For example, short-term synaptic plasticity (STSP) can affect synaptic strength on timescales less than a second, with such effects mainly presynaptic-dependent (Stevens and Wang, 1995; Tsodyks and Markram, 1997). At slower timescales, long-term potentiation (LTP) can have effects over minutes to hours or longer, with the early phase being dependent on local signals and the late phase including a more complex dependence on protein synthesis (Baltaci et al., 2019). Also on the slower end, spike-time-dependent plasticity (STDP) adjusts the strengths of connections based on the relative timing of pre- and postsynaptic spikes (Markram et al., 1997; Bi and Poo, 1998; McFarlan et al., 2023). In this work, we investigate a new type of artificial neural network (ANN) that uses biologically motivated synaptic modulations to process short-term sequential information. The multi-plasticity network (MPN) learns using two complementary plasticity mechanisms: (1) long-term synaptic rewiring via standard supervised ANN training and (2) simple synaptic modulations that operate at faster timescales. Unlike many other neural network models with synaptic dynamics (Tsodyks et al., 1998; Mongillo et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019), the MPN has no recurrent synaptic connections, and thus can only rely on modulations of synaptic strengths to pass short-term information across time. Although both recurrent connections and synaptic modulation are present in the brain, it can be difficult to isolate how each of these affects temporal computation. The MPN thus allows for an in-depth study of the computational power of synaptic modulation alone and how the dynamics behind said computations may differ from networks that rely on recurrence. Having established how modulations alone compute, we believe it will be easier to disentangle synaptic computations from brain-like networks that may compute using a combination of recurrent connections, synaptic dynamics, neuronal dynamics, etc. Biologically, the modulations in the MPN represent a general synapse-specific change of strength on shorter timescales than the structural changes, the latter of which are represented by weight adjustment via backpropagation. We separately consider two forms of modulation mechanisms, one of which is dependent on both the pre- and postsynaptic firing rates and a second that only depends on presynaptic rates. The first of these rules is primarily envisioned as coming from associative forms of plasticity that depend on both pre- and postsynaptic neuron activity (Markram et al., 1997; Bi and Poo, 1998; McFarlan et al., 2023). Meanwhile, the second type of modulation models presynaptic-dependent STSP (Mongillo et al., 2008; Zucker and Regehr, 2002). While both these mechanisms can arise from distinct biological mechanisms and can span timescales of many orders of magnitude, the MPN uses simplified dynamics to keep the effects of synaptic modulations and our subsequent results as general as possible. It is important to note that in the MPN, as in the brain, the mechanisms that represent synaptic modulations and rewiring are not independent of one another – changes in one affect the operation of the other and vice versa. To understand the role of synaptic modulations in computing and how they can change neuronal dynamics, throughout this work we contrast the MPN with recurrent neural networks (RNNs), whose synapses/weights remain fixed after a training period. RNNs store temporal, task-relevant information in transient internal neural activity using recurrent connections and have found widespread success in modeling parts of our brain (Cannon et al., 1983; Ben-Yishai et al., 1995; Seung, 1996; Zhang, 1996; Ermentrout, 1998; Stringer et al., 2002; Xie et al., 2002; Fuhs and Touretzky, 2006; Burak and Fiete, 2009). Although RNNs model the brain’s significant recurrent connections, the weights in these networks neglect the role transient synaptic dynamics can have in adjusting synaptic strengths and processing information. Considerable progress has been made in analyzing brain-like RNNs as population-level dynamical systems, a framework known as neural population dynamics (Vyas et al., 2020). Such studies have revealed a striking universality of the underlying computational scaffold across different types of RNNs and tasks (Maheswaranathan et al., 2019b). To elucidate how computation through synaptic modulations affect neural population behavior, we thoroughly characterize the MPN’s low-dimensional behavior in the neural population dynamics framework (Vyas et al., 2020). Using a novel approach of analyzing the synapse population behavior, we find the MPN computes using completely different dynamics than its RNN counterparts. We then explore the potential benefits behind its distinct dynamics on several neuroscience-relevant tasks. Contributions The primary contributions and findings of this work are as follows: We elucidate the neural population dynamics of the MPN trained on integration-based tasks and show it operates with qualitatively different dynamics and attractor structure than RNNs. We support this with analytical approximations of said dynamics. We show how the MPN’s synaptic modulations allow it to store and update information in its state space using a task-independent, single point-like attractor, with dynamics slower than task-relevant timescales. Despite its simple attractor structure, for integration-based tasks, we show the MPN performs at level comparable or exceeding RNNs on several neuroscience-relevant measures. The MPN is shown to have dynamics that make it a more effective reservoir, less susceptible to catastrophic forgetting, and more flexible to taking in new information than RNN counterparts. We show the MPN is capable of learning more complex tasks, including contextual integration, continuous integration, and 19 neuroscience tasks in the NeuroGym package (Molano-Mazon et al., 2022). For a subset of tasks, we elucidate the changes in dynamics that allow the network to solve them. Related work Networks with synaptic dynamics have been investigated previously (Tsodyks et al., 1998; Mongillo et al., 2008; Sugase-Miyamoto et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019; Hu et al., 2021; Tyulmankov et al., 2022; Tyulmankov et al., 2022; Rodriguez et al., 2022). As we mention above, many of these works investigate networks with both synaptic dynamics and recurrence (Tsodyks et al., 1998; Mongillo et al., 2008; Lundqvist et al., 2011; Barak and Tsodyks, 2014; Orhan and Ma, 2019; Ballintyn et al., 2019; Masse et al., 2019), whereas here we are interested in investigating the computational capabilities and dynamical behavior of computing with synapse modulations alone. Unlike previous works that examine computation solely through synaptic changes, the MPN’s modulations occur at all times and do not require a special signal to activate their change (Sugase-Miyamoto et al., 2008). The networks examined in this work are most similar to the recently introduced ‘HebbFF’ (Tyulmankov et al., 2022) and ‘STPN’ (Rodriguez et al., 2022) that also examine computation through continuously updated synaptic modulations. Our work differs from these studies in that we focus on elucidating the neural population dynamics of such networks, contrasting them to known RNN dynamics, and show why this difference in dynamics may be beneficial in certain neuroscience-relevant settings. Additionally, the MPN uses a multiplicative modulation mechanism rather than the additive modulation of these two works, which in some settings we investigate yields significant performance differences. The exact form of the synaptic modulation updates were originally inspired by ‘fast weights’ used in machine learning for flexible learning (Ba et al., 2016). However, in the MPN, both plasticity rules apply to the same weights rather than different ones, making it more biologically realistic. This work largely focuses on understanding computation through a neural population dynamics-like analysis (Vyas et al., 2020). In particular, we focus on the dynamics of networks trained on integration-based tasks, that have previously been studied in RNNs (Maheswaranathan et al., 2019b; Maheswaranathan et al., 2019a; Maheswaranathan and Sussillo, 2020; Aitken et al., 2020). These studies have demonstrated a degree of universality of the underlying computational structure across different types of tasks and RNNs (Maheswaranathan et al., 2019b). Due to the MPN’s dynamic weights, its operation is fundamentally different than said recurrent networks. Setup Throughout this work, we primarily investigate the dynamics of the MPN on tasks that require an integration of information over time. To correctly respond to said task, the network is required to both store and update its internal state as well as compare several distinct items in its memory. All tasks in this work consist of a discrete sequence of vector inputs, xt for t=1,2,…,T. For the tasks we consider presently, at time T the network is queried by a ‘go signal’ for an output, for which the correct response can depend on information from the entire input sequence. Throughout this paper, we denote vectors using lowercase bold letters, matrices by uppercase bold letters, and scalars using standard (not-bold) letters. The input, hidden, and output layers of the networks we study have d, n, and N neurons, respectively. Multi-plasticity network The multi-plasticity network (MPN) is an artificial neural network consisting of input, hidden, and output layers of neurons. It is identical to a fully-connected, two-layer, feedforward network (Figure 1, middle), with one major exception: the weights connecting the input and hidden layer are modified by the time-dependent synapse modulation (SM) matrix, M (Figure 1, left). The expression for the hidden layer activity at time step t is (1) ht=tanh((Mt−1⊙Winp)xt+Winpxt) where Winp is an n-by-d weight matrix representing the network’s synaptic strengths that is fixed after training, ‘⊙’ denotes element-wise multiplication of the two matrices (the Hadamard product), and the tanh(⋅) is applied element-wise. For each synaptic weight in Winp, a corresponding element of Mt−1 multiplicatively modulates its strength. Note if Mt−1=0 the first term vanishes, so the Winp are unmodified and the network simply functions as a fully connected feedforward network. Figure 1 Download asset Open asset Two neural network computational mechanisms: synaptic modulations and recurrence. Throughout this figure, neurons are represented as white circles, the black lines between neurons represent regular feedforward weights that are modified during training through gradient descent/backpropagation. From bottom to top are the input, hidden, and output layers, respectively. (Middle) A two-layer, fully connected, feedforward neural network. (Left) Schematic of the MPN. Here, the pink and black lines (between the input and hidden layer) represent weights that are modified by both backpropagation (during training) and the synapse modulation matrix (during an input sequence), see Equation 1. (Right) Schematic of the Vanilla RNN. In addition to regular feedforward weights between layers, the RNN has (fully connected) weights between its hidden layer from one time step to the next, see Equation 3. What allows the MPN to store and manipulate information as the input sequence is passed to the network is how the SM matrix, Mt, changes over time. Throughout this work, we consider two distinct modulation update rules. The primary rule we investigate is dependent upon both the pre- and postsynaptic firing rates. An alternative update rule only depends upon the presynaptic firing rate. Respectively, the SM matrix updated for these two cases takes the form (Hebb, 2005; Ba et al., 2016; Tyulmankov et al., 2022), (2a) pre.&post.:Mt=λMt−1+ηhtxtT (2b) pre. only:Mt=λMt−1+η1xtT/n, where λ and η are parameters learned during training and 1 is the n-dimensional vector of all 1s. We allow for −∞<η<∞, so the size and sign of the modulations can be optimized during training. Additionally, 0<λ<1, so the SM matrix exponentially decays at each time step, asymptotically returning to its M=0 baseline. For both rules, we define M0=0 at the start of each input sequence. Since the SM matrix is updated and passed forward at each time step, we will often refer to Mt as the state of said networks. To distinguish networks with these two modulation rules, we will refer to networks with the presynaptic only rule as MPNpre, while we reserve MPN for networks with the pre- and postsynatpic update that we primarily investigate. For brevity, and since almost all results for the MPN generalize to the simplified update rule of the MPNpre, the main text will foremost focus on results for the MPN. Results for the MPNpre are discussed only briefly or given in the supplement. As mentioned in the introduction, from a biological perspective the MPN’s modulations represent a general associative plasticity such as STDP, whereas the presynaptic-dependent modulations of the MPNpre can represent STSP. The decay induced by λ represents the return to baseline of the aforementioned processes, which all occur at a relatively slow speed to their onset (Bertram et al., 1996; Zucker and Regehr, 2002). To ensure the eventual decay of such modulations, unless otherwise stated, throughout this work we further limit λ<λmax with λmax=0.95. Additionally, we observe no major performance or dynamics difference for positive or negative η, so we do not distinguish the two throughout this work (Methods). We emphasize that the modulation mechanisms of the MPN and MPNpre could represent biological processes that occur at significantly different timescales, so although we train them on identical tasks the tasks themselves are assumed to occur at timescales that match the modulation mechanism of the corresponding network. Note that the modulation mechanisms are not independent of weight adjustment from backpropagation. Since the SM matrix is active during training, the network’s weights that are being adjusted by backpropgation (see below) are experiencing modulations, and said modulations factor into how the weights are adjusted. Lastly, the output of the MPN and MPNpre at time T is determined by a fully-connected readout matrix, yT=WROhT, where WRO is an N-by-n weight matrix adjusted during training. Throughout this work, we will view said readout matrix as N distinct n-dimensional readout vectors, that is one for each output neuron. Recurrent neural networks As discussed in the introduction, throughout this work we will compare the learned dynamics and performance of the MPN to artificial RNNs. The hidden layer activity for the simplest recurrent neural network, the Vanilla RNN, is (3) ht=tanh(Wrecht−1+Winpxt+b), with Wrec the recurrent weights, an n-by-n matrix that updates the hidden neurons from one time step to the next (Figure 1, right). We also consider a more sophisticated RNN structure, the gated recurrent unit (GRU), that has additional gates to more precisely control the recurrent update of its hidden neurons (see Methods 5.2). In both these RNNs, information is stored and updated via the hidden neuron activity, so we will often refer to ht as the RNNs’ hidden state or just its state. The output of the RNNs is determined through a trained readout matrix in the same manner as the MPN above, i.e. yT=WROhT. Training The weights of the MPN, MPNpre, and RNNs will be trained using gradient descent/backpropagation through time, specifically ADAM (Kingma and Ba, 2014). All network weights are subject to L1 regularization to encourage sparse solutions (Methods 5.2). Cross-entropy loss is used as a measure of performance during training. Gaussian noise is added to all inputs of the networks we investigate. Results Network dynamics on a simple integration task Simple integration task We begin our investigation of the MPN’s dynamics by training it on a simple N-class (Through most of this work, the number of neurons in the output layer of our networks will always be equal to the number of classes in the task, so we use N to denote both unless otherwise stated). integration task, inspired by previous works on RNN integration-dynamics (Maheswaranathan et al., 2019a; Aitken et al., 2020). In this task, the network will need to determine for which of the N classes the input sequence contains the most evidence (Figure 2a). Each stimulus input, xt, can correspond to a discrete unit of evidence for one of the N classes. We also allow inputs that are evidence for none of the classes. The final input, xT, will always be a special ‘go signal’ input that tells the network an output is expected. The network’s output should be an integration of evidence over the entire input sequence, with an output activity that is largest from the neuron that corresponds to the class with the maximal accumulated evidence. (We omit sequences with two or more classes tied for the most evidence. See Methods 5.1 for additional details). Prior to adding noise, each possible input, including the go signal, is mapped to a random binary vector (Figure 2b). We will also investigate the effect of inserting a delay period between the stimulus period and the go signal, during which no input is passed to the network, other than noise (Figure 2c). Figure 2 Download asset Open asset Schematic of simple integration task. (a) Example sequence of the two-class integration task where each represents an and throughout this work, distinct classes are represented by different In this and The represent evidence for their while the represents an input that is evidence for At the of the sequence is the ‘go signal’ that the network an output is expected. The correct response for the sequence is the class with the most in the the Each possible input is mapped to a random binary The integration task can be modified by the of a between the stimulus period and the go the delay the network no input than We find the MPN is capable of learning the integration task to across a wide of class sequence and delay It is the of this to the dynamics behind the trained MPN that allow it to solve such a task and compare them to more RNN dynamics. Here, in the main we will explore the dynamics of a two-class integration task, to classes are and are discussed in the Methods We will start by the simplest of integration a delay the effects of delay we into the dynamics of the MPN, we a of the known RNN dynamics on integration-based tasks. of RNN attractor dynamics accumulated evidence both on and artificial neural networks, have that networks with recurrent connections attractor dynamics to solve integration-based tasks (Maheswaranathan et al., 2019a; Maheswaranathan and Sussillo, 2020; Aitken et al., 2020). Here, we specifically review the behavior of artificial RNNs on the aforementioned N-class integration tasks that many with of neural networks. Note also the of the dynamics can depend on between the classes et al., in this work we only investigate the where the classes are RNNs are capable of learning to solve the simple integration task at and their dynamics are qualitatively the same across several (Maheswaranathan et al., 2019b; Aitken et al., 2020). the network’s behavior by at individual hidden neuron activity can be difficult (Figure and so it is to to a population-level analysis of the dynamics. the number of hidden neurons is than number of integration classes the population activity of the trained RNN primarily in a low-dimensional of et al., 2020). This is to recurrent dynamics that a attractor of and the hidden activity often operates to said Methods for a more in-depth review of these results including how is In the two-class the RNN will operate to a The of hidden activity allows for an of the dynamics using a (Figure From the we see the network’s hidden activity from the attractor its As evidence for one class over the other the hidden activity accumulated evidence by the attractor (Figure The two readout vectors are with the two of the so the further the final hidden activity, is one of the attractor, the that corresponding output and thus the RNN correctly the class with the most evidence. For we note that the hidden activity of the trained RNN is not dependent upon the input of the present time step (Figure it is the change in the hidden activity from one time step to the next, that are (Figure For the Vanilla RNN (GRU), we find of the hidden activity to be by the accumulated evidence and only to be by the present input to the network Methods Figure with see all Download asset Open asset of multi-plasticity network and RNN dynamics. Vanilla RNN hidden neuron dynamics, see Figure 1 for (a) layer neural activity, for neurons of the RNN as a of sequence time of sequence The represents the stimulus period during which information should be across time and the representing the response to the go neuron activity, over input into their top two by relative accumulated evidence between classes at time t (Methods Also shown are of by the class readout vector and the state as with ht by input at the present time step, xt see The shows the of as a of the present input, xt, with the lines showing the for each of the MPN hidden neuron dynamics, see Figure 1 for as as as for The shows the of each with the readout vectors (Methods MPN synaptic modulation dynamics. as of hidden neuron activity, the of the SM Mt, over input Mt are for as with a different The is the same as that shown in for MPN hidden activity inputs, not so accumulated evidence We to analyzing the hidden activity of the trained in the same manner that for the RNNs. The MPN trained on a two-class integration task to have significantly more activity in the individual of ht (Figure We find the hidden neuron activity to be with it to using a (Methods Unlike the RNN, we observe the hidden neuron activity to be into several distinct (Figure input sequences ht to between said the ht by the sequence input at the present time step, we see the different inputs are the hidden activity into distinct that we (Figure the hidden neuron activity is largely dependent upon the most input to the network, rather than the accumulated evidence as we for the RNN. However, each we also see a in ht from accumulated evidence (Figure For the MPN we find only of the hidden activity to be by accumulated evidence and to be by the present input to the network Methods MPNpre dynamics are largely the same of we see for further the hidden neuron activity primarily dependent upon the input to the network, one may how the MPN information dependent upon the entire sequence to solve the task. the other possible inputs to the network, the go signal has its distinct which the hidden by accumulated evidence. all we find the readout vectors are with the evidence the go (Figure The are
- Peer Review Report
- 10.7554/elife.83035.sa1
- Jan 8, 2023
Synaptic modulations alone imbue networks with computational capabilities comparable to recurrent connections on several neuroscience-relevant tasks, which manifest in fundamentally different neuronal dynamics.
- Conference Article
3
- 10.1109/yac53711.2021.9486432
- May 28, 2021
The most important step in the process of medical record analysis in TCM is the classification of medical records. The biggest challenge of medical record classification is to perceive the correlation between context words and find keywords, and make judgments based on the keyword information. In this article, we propose a TCM medical record analysis algorithm based on recurrent convolutional neural network, which introduces a maximum pooling layer in the recurrent neural network, and uses it to determine the words that play an important role in text classification to capture the key components of the text. Experimental results show that recurrent convolutional neural network achieves better results than attention recurrent neural network and traditional recurrent neural network. In addition, recurrent convolutional neural network is more than twice as fast as them in terms of training speed.
- Conference Article
7
- 10.1109/icsess49938.2020.9237709
- Oct 16, 2020
Text classification is an essential and classical problem in natural language processing. Traditional text classifiers often rely on many human-designed features. With the rise of deep learning, Recurrent Neural Networks and Convolutional Neural Networks have widely applied into text classification. Meanwhile, the success of Graph Neural Networks (GNN) on structural data has attracted many researchers to apply GNN to traditional NLP applications. However, when these methods use the GNN, they commonly ignore the word order information of the sentence. In this work, we propose a model that uses a recurrent structure to capture contextual information as far as possible when learning word representations, which keeps word orders information compared to GNN-based networks. Then, we use the idea of GNN's message passing to aggregate the contextual information and update the word hidden representation. Like GNN's readout operation, we employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the critical components in texts. We conduct experiments on four widely used datasets, and the experimental results show that our model achieves significant improvements against RNN-based model and GNN-based model.
- Research Article
45
- 10.1016/j.asoc.2022.108836
- Apr 18, 2022
- Applied Soft Computing
A Lyapunov-stability-based context-layered recurrent pi-sigma neural network for the identification of nonlinear systems
- Conference Article
28
- 10.1109/icphm.2019.8819440
- Jun 1, 2019
Remaining Useful Life (RUL) prediction of rotating machinery plays a critical role in Prognostics and Health Management (PHM). Data-driven methods for RUL estimation have been widely developed because they don’t depend on much prior knowledge of the system. Recurrent neural network (RNN) is capable of modeling sequential data, which has been investigated for RUL prediction with statistical features of vibration signals in time domain and frequency domain. The drawback of utilizing statistical features is the ignorance of time-frequency information, which is critical in RUL prediction because the vibration signals are non-stationary when the fault occurs. To solve this problem, a novel deep architecture, named deep recurrent convolutional neural network (DRCNN) is proposed. By incorporating convolutional operation in the process of state transition of RNN, the spatial information in time-frequency domain can be automatically learned from the vibration signals, which contributes to the improvement of prediction performance. With convolutional operation in RNN, both spatial information in time-frequency domain and previous information are employed for RUL prediction. Furthermore, by stacking recurrent convolutional neural network layer by layer, the deep architecture can learn high-level features in the time-frequency domain. Finally, experimental analysis of RUL prediction using vibration signals of run-to-failure tests are carried out. Compared with the results of conventional deep RNN method, the proposed method shows its effectiveness and superiority.
- Research Article
251
- 10.1609/aaai.v33i01.3301485
- Jul 17, 2019
- Proceedings of the AAAI Conference on Artificial Intelligence
Traffic prediction is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as spatial dependency of complicated road networks and temporal dynamics, and many more. The factors make traffic prediction a challenging task due to the uncertainty and complexity of traffic states. In the literature, many research works have applied deep learning methods on traffic prediction problems combining convolutional neural networks (CNNs) with recurrent neural networks (RNNs), which CNNs are utilized for spatial dependency and RNNs for temporal dynamics. However, such combinations cannot capture the connectivity and globality of traffic networks. In this paper, we first propose to adopt residual recurrent graph neural networks (Res-RGNN) that can capture graph-based spatial dependencies and temporal dynamics jointly. Due to gradient vanishing, RNNs are hard to capture periodic temporal correlations. Hence, we further propose a novel hop scheme into Res-RGNN to utilize the periodic temporal dependencies. Based on Res-RGNN and hop Res-RGNN, we finally propose a novel end-to-end multiple Res-RGNNs framework, referred to as “MRes-RGNN”, for traffic prediction. Experimental results on two traffic datasets have demonstrated that the proposed MRes-RGNN outperforms state-of-the-art methods significantly.
- Conference Article
1
- 10.18653/v1/p18-2002
- Jan 1, 2018
Increasing the capacity of recurrent neural networks (RNN) usually involves augmenting the size of the hidden layer, with significant increase of computational cost. Recurrent neural tensor networks (RNTN) increase capacity using distinct hidden layer weights for each word, but with greater costs in memory usage. In this paper, we introduce restricted recurrent neural tensor networks (r-RNTN) which reserve distinct hidden layer weights for frequent vocabulary words while sharing a single set of weights for infrequent words. Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve language model performance over RNNs using only a small fraction of the parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated Recurrent Units and Long Short-Term Memory.
- Research Article
12
- 10.1016/j.adhoc.2022.103016
- Oct 10, 2022
- Ad Hoc Networks
Combining recurrent and Graph Neural Networks to predict the next place’s category
- Conference Article
5
- 10.1109/vcip.2018.8698617
- Dec 1, 2018
Robust and accurate predicting of articulatory movements has various important applications, such as 3D articulatory animations and visual communication. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough. Recently, deep neural network (DNN), especially convolutional neural network (CNN) and recurrent neural network (RNN), has brought tremendous success in speech recognition and synthesis. To increase the accuracy, we propose a new network architecture for acoustic-articulatory mapping, called long-term recurrent convolutional neural network (LTRCNN). The network consists of CNN, RNN and a skip connection. CNN can model the spectral correlation among acoustic features efficiently. RNN, like long short-term memory (LSTM), can learn the temporal context information from sequential data powerfully. Besides, skip connections can increase the input representation from different levels to preserve the feature information. Experiments show that LTRCNN achieves the state-of-the-art root-mean-squared error (RMSE) with 0.690 mm and the correlation coefficient with 0.949 in this prediction task.
- Conference Article
1
- 10.1109/sipnn.1994.344924
- Apr 13, 1994
Recurrent neural networks (RNNs) can be used to handle sequential patterns and have been used for speech recognition. To overcome the shortcomings of RNN, recurrent sub neural networks (RSNNs) are used, where an RSNN is built independently for each class. The training algorithm of the RSNN is based on the backpropagation algorithm. Speaker dependent connected Chinese digit-speech recognition experiments were carried out. Some factors influencing the performance of RSNNs have been studied. The experiments show that RSNN is easier to train and gives higher performance than RNN.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Conference Article
5
- 10.1109/bibm55620.2022.9995451
- Dec 6, 2022
Deep learning methods have been successfully applied to the tasks of predicting functional genomic elements such as histone marks, transcriptions factor binding sites, non-B DNA structures, and regulatory variants. Initially convolutional neural networks (CNN) and recurrent neural networks (RNN) or hybrid CNN-RNN models appeared to be the methods of choice for genomic studies. With the advance of machine learning algorithms other deep learning architectures started to outperform CNN and RNN in various applications. Graph neural network (GNN) applications improved the prediction of drug effects, disease associations, protein-protein interactions, protein structures and their functions. The performance of GNN is yet to be fully explored in genomics. Earlier we developed DeepZ approach in which deep learning model is trained on information both from sequence and omics data. Initially this approach was implemented with CNN and RNN but is not limited to these classes of neural networks. In this study we implemented the DeepZ approach by substituting RNN with GNN. We tested three different GNN architectures - Graph Convolutional Network (GCN), Graph Attention Network (GAT) and inductive representation learning network GraphSAGE. The GNN models outperformed current state-of the art RNN model from initial DeepZ realization. Graph SAGE showed the best performance for the small training set of human Z-DNA ChIP-seq data while Graph Convolutional Network was superior for specific curaxin-induced mouse Z-DNA data that was recently reported. Our results show the potential of GNN applications for the task of predicting genomic functional elements based on DNA sequence and omics data.Availability and implementation–The code is freely available at https://github.com/MrARVO/GraphZ.
- Conference Article
19
- 10.1109/icassp.2013.6638961
- May 1, 2013
This paper investigates the combination of different short-term features and the combination of recurrent and non-recurrent neural networks (NNs) on a Spanish speech recognition task. Several methods exist to combine different feature sets such as concatenation or linear discriminant analysis (LDA). Even though all these techniques achieve reasonable improvements, feature combination by multi-layer perceptrons (MLPs) outperforms all known approaches. We develop the concept of MLP based feature combination further using recurrent neural networks (RNNs). The phoneme posterior estimates derived from an RNN lead to a significant improvement over the result of the MLPs and achieve a 5% relative better word error rate (WER) with much less parameters. Moreover, we improve the system performance further by combining an MLP and an RNN in a hierarchical framework. The MLP benefits from the preprocessing of the RNN. All NNs are trained on phonemes. Nevertheless, the same concepts could be applied using context-dependent states. In addition to the improvements in recognition performance w.r.t. WER, NN based feature combination methods reduce both, the training and the testing complexity. Overall, the systems are based on a single set of acoustic models, together with the training of different NNs.
- Book Chapter
3
- 10.5772/6506
- Jan 1, 2009
In this paper, the adaptive control based on neural network is studied. Firstly, a neural network based adaptive robust tracking control design is proposed for robotic systems under the existence of uncertainties. In this proposed control strategy, the NN is used to identify the modeling uncertainties, and then the disadvantageous effects caused by neural network approximating error and external disturbances in robotic system are counteracted by robust controller. Especially the proposed control strategy is designed based on HJI inequation theorem to overcome the approximation error of the neural network bounded issue. Simulation results show that proposed control strategy is effective and has better performance than traditional robust control strategy. Secondly, an RFNN for realizing fuzzy inference using the dynamic fuzzy rules is proposed. The proposed RFNN consists of four layers and the feedback connections are added in first layer. The proposed RFNN can be used for the identification and control of dynamic system. For identification, RFNN only needs the current inputs and most recent outputs of system as its inputs. For control, two RFNNs are used to constitute an adaptive control system, one is used as identifier (RFNNI) and another is used as controller (RFNNC). Also to prove the proposed RFNN and control strategy robust, it is used to control the robot manipulator and simulation results verified their effectiveness.