Recurrent Neural Networks for Multivariate Time Series with Missing Values
This paper introduces GRU-D, a deep learning model based on Gated Recurrent Units that leverages missing data patterns, such as masking and time intervals, to improve multivariate time series prediction. Experiments on clinical and synthetic datasets show that GRU-D achieves state-of-the-art performance, effectively capturing long-term dependencies and utilizing missingness for enhanced accuracy.
Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
- Dissertation
- 10.33915/etd.12347
- Jan 1, 2024
In this study, we propose a novel anomaly detection framework designed specifically for Multivariate Time Series (MTS) data, addressing the prevalent challenges in analyzing such complex datasets. The detection of anomalies within MTS data is notably difficult due to the complex interplay of numerous variables, temporal dependencies, and the common issue of class imbalance, where one category significantly outnumbers another. Traditional deep learning (DL) approaches often fall short in simultaneously tackling these issues. Our framework is designed to address these challenges through a two-phased approach. Phase I employs Conditional Tabular Generative Adversarial Networks (CTGAN) to create strategic synthetic data, setting the stage for Phase II, which utilizes a hybrid DL architecture. This architecture combines Gated Recurrent Units (GRU), Temporal Convolutional Networks (TCN), and an Attention Mechanism, significantly improving the detection of anomalies. Our approach is tailored to overcome the hurdles of class imbalance — using strategic data augmentation in Phase I — and to address the intricacies of variable interactions and long-term temporal dependencies through a hybrid DL model in Phase II. The efficacy of our framework is demonstrated through the Controlled Anomaly Time Series (CATS) dataset, notable for its complexity with over 5 million timestamps, 17 features, and a marked class imbalance. Our methodology distinguishes itself by detecting subtle anomalies, capturing long-range dependencies more effectively, and enhancing interpretability through the visualization of attention weights. Furthermore, our anomaly detection framework is both scalable and adaptable across different domains, marking a considerable improvement over existing methods. A performance comparison with other models, including standalone GRU, TCN, combined GRU-TCN, and GRU-TCN with Attention, showcases the superior capability of our framework, particularly in managing the intricacies and rarity of anomalies in the CATS dataset. This framework not only addresses the challenges of data imbalance and complexity inherent in MTS datasets but also harnesses the strengths of various DL architectures to provide an effective anomaly detection solution. Our contribution promises significant advancements in the accuracy, reliability, and interpretability of anomaly detection models, representing a major leap forward in this domain.
- Conference Article
43
- 10.1109/icdm.2005.109
- Nov 27, 2005
Multivariate time series (MTS) data sets are common in-various multimedia, medical and financial application domains. These applications perform several data-analysis operations on large number of MTS data sets such as similarity searches, feature-subset-selection, clustering and classifications. Correlation-based techniques, such as principal component analysis (PCA), have proven to improve the efficiency of many of the above-mentioned data-analysis operations on MTS, which implies that the correlation coefficients concisely represent the original MTS data. However, if the statistical properties (e.g., variance) of MTS data change over time dimension, i.e., MTS data is non-stationary, the correlation coefficients are not stable. In this paper, we propose to utilize the stationarity of the MTS data sets, in order to represent the original MTS data more stably, as well as concisely with the correlation coefficients. That is, before performing any correlation-based data analysis, we first executes the stationarity test to decide whether the MTS data is stationary or not, i.e., whether the correlation is stable or not. Subsequently, for a non-stationary MTS data set, we difference it to render the data set stationary. Even though our approach is general, to focus the discussion we describe our approach within the context of our previously proposed technique for MTS similarity search. In order to show the validity of our approach, we performed several experiments on four real-world data sets. The results show that the performance of our similarity search technique have significantly improved in terms of precision/recall.
- Research Article
4
- 10.1007/s11082-025-08090-7
- Mar 10, 2025
- Optical and Quantum Electronics
Recent studies on channel estimation in wireless communication systems have focused on deep learning methods. Our primary contribution is based on the use of DenseNet121 hybrid with Random Forest (RF), Gated Recurrent Units (GRU), Long Short-Term Memory Networks (LSTM), and Recurrent Neural Networks (RNN) to improve the channel estimation and lower the error rate. In order to mitigate inter-symbol interference and map the datasets, this paper introduces M-quadrature amplitude modulation (16-QAM) and orthogonal frequency division multiplexing (OFDM), which is based on quadrature phase shift keying (QPSK). Additionally, the existence or lack of cyclic prefixes forms the basis of our simulation. Additionally, the suggested models are investigated using pilot samples 2, 4, 8, and 64. Labeled OFDM signal samples, where the labels match the signal received after applying OFDM and passing through the medium, are used to train the proposed models. The DenseNet121 functions as a powerful feature extractor to extract intricate spatial information from received signal data. Sequential models like as RNN, LSTM, and GRU are used to model temporal dependencies in the retrieved features. RF is also utilized to exploit non-linear relationships and interactions between features to further increase prediction accuracy and reduce bit error rate (BER). By comparing the models using key metrics like accuracy, bit error rate (BER), and mean squared error (MSE), superior performance is attained based on the DenseNet121_RNN_GRU_RF model. Additionally, the DLMs are assessed against traditional methods like minimal mean square error (MMSE) and least squares (LS). Using the DenseNet121_RNN_GRU_RF model indicates a considerable gain over alternative architectures, with an improvement of 36.3% over DensNet121-RNN-LSTM-RF, according to a comparison of the suggested models without cyclic prefix for OFDM_QPSK. The improvement in percentages of roughly 63.3% over DensNet121-RNN-LSTM, 68.18% over DensNet121-GRU, 72.7% over DensNet121-LSTM, and 86.3% is the improvements of DenseNet121_RNN_GRU_RF over DensNet121-RNN are 86.3 and 72.7%, respectively, over DensNet121-GRU and DensNet121-LSTM. The DenseNet121_RNN_GRU_RF model performs better than the other models when compared to the suggested model with cyclic prefix for OFDM_QPSK. Compared to DenseNet121_RNN_LSTM_RF, the DenseNet121_RNN_GRU_RF model improves BER by about 45%. In contrast, the DenseNet121_RNN_GRU_RF model outperforms DenseNet121_RNN_LSTM by roughly 66.6%. It outperforms DenseNet121_GRU by 71.4%, DenseNet121_LSTM by 80.9%, and DenseNet121_RNN by 90.4%. Additionally, DenseNet121_RNN_GRU_RF shows a significant improvement over LS, requiring a 70% improvement over the LS approach. DenseNet121_RNN_GRU_RF outperforms the Minimum Mean Square Error (MMSE) by roughly 39.5%. Additionally, when using QPSK, higher pilot counts typically translate into lower MSE values. At MSE = 10-3, the improvement of employing 64 pilot bits over 8 pilot bits is approximately 12.1%. utilizing eight pilot bits improves performance by roughly 21.2% compared to utilizing two or four pilot bits. Performance is improved by approximately 18.9% at BER = 10-4 when there are eight pilots instead of four. Furthermore, there is a 13.8% improvement in accuracy from 8 to 64 pilots, indicating that more pilots can further increase accuracy. Finally, BER performance is greatly improved with additional pilots, as evidenced by the noteworthy 35.3% improvement between 4 and 64 pilots. For OFDM-QPSK, employing CP often results in an improvement of roughly 9% over not utilizing CP. Compared to the LS and MMSE models, the DenseNet121_RNN_GRU_RF model provides a significant BER improvement in terms of error rate reduction and computing time of 4.215 s. This suggests that the model's capacity to precisely estimate the channel and reduce bit errors has significantly improved.
- Research Article
60
- 10.1007/s11280-021-01003-0
- Apr 14, 2022
- World Wide Web
In recent years, artificial intelligence technologies have been successfully applied in time series prediction and analytic tasks. At the same time, a lot of attention has been paid to financial time series prediction, which targets the development of novel deep learning models or optimize the forecasting results. To optimize the accuracy of stock price prediction, in this paper, we propose a clustering-enhanced deep learning framework to predict stock prices with three matured deep learning forecasting models, such as Long Short-Term Memory (LSTM), Recurrent Neural Network (RNN) and Gated Recurrent Unit (GRU). The proposed framework considers the clustering as the forecasting pre-processing, which can improve the quality of the training models. To achieve the effective clustering, we propose a new similarity measure, called Logistic Weighted Dynamic Time Warping (LWDTW), by extending a Weighted Dynamic Time Warping (WDTW) method to capture the relative importance of return observations when calculating distance matrices. Especially, based on the empirical distributions of stock returns, the cost weight function of WDTW is modified with logistic probability density distribution function. In addition, we further implement the clustering-based forecasting framework with the above three deep learning models. Finally, extensive experiments on daily US stock price data sets show that our framework has achieved excellent forecasting performance with overall best results for the combination of Logistic WDTW clustering and LSTM model using 5 different evaluation metrics.
- Research Article
2
- 10.22219/kinetik.v6i4.1330
- Nov 30, 2021
- Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control
One of the oldest known predictive analytics techniques is time series prediction. The target in time series prediction is use historical data about a specific quantity to predicts value of the same quantity in the future. Multivariate time series (MTS) data has been widely used in time series prediction research because it is considered better than univariate time series (UTS) data. However, in reality MTS data sets contain various types of information which makes it difficult to extract information to predict the situation. Therefore, UTS data still has a chance to be developed because it is actually simpler than MTS data. UTS prediction treats forecasts as a single variable problem, whereas MTS may employ a large number of time-concurred series to make predictions. Neural Network (NN) model could be built to predict the target variable given the other (predictor) variables. In this study, we used Particle Swarm Optimization (PSO) algorithm to optimize performance of NN on a UTS dataset. Our proposed model is validated using x-validation and and use RMSE to measure its performance. The experimental results show that NN performance after optimization using PSO produces good results compared to classical NN performance. This is evidenced by the value of RMSE = 0.410 which is the smallest RMSE value produced. The smaller the RMSE value, the better the model performance. It can be concluded that the proposed method can improve NN performance on UTS data.
- Research Article
4
- 10.1016/j.procs.2024.10.301
- Jan 1, 2024
- Procedia Computer Science
Weather Prediction in Agriculture Yields with Transformer Model
- Conference Article
1
- 10.1145/3714334.3714351
- Dec 20, 2024
Vacuum glass, a thermal insulation material that is highly effective, has a diverse array of applications in the manufacturing of home appliances and the improvement of energy efficiency in buildings. Accurately determining the heat transfer coefficient of vacuum glass is essential for optimising its design and application. The heat transfer coefficient of vacuum glass cannot be rapidly and batch-detected using conventional methods. Deep learning models are capable of acquiring the law of heat transfer coefficient from historical data and providing a rapid and precise prediction of the heat transfer coefficient for new samples. In recent years, recurrent neural networks (RNN) and their variants have demonstrated exceptional performance in time-series prediction tasks, thereby offering novel solutions to the issue of swiftly determining the heat transfer coefficient of the vacuum glass. The objective of this investigation is to compare three recurrent neural network models—simple RNN, long short-term memory network (LSTM), and gated recurrent unit (GRU)—in order to ascertain the most appropriate model for predicting the heat transmission coefficients of vacuum glass. The experimental results show that the prediction accuracy of the GRU model on the test set is significantly better than that of the simple RNN and LSTM models, in addition, the GRU model maintains a low computational complexity. In the future, there will be additional research conducted on deep learning models that are similar to GRU in order to enhance the interpretability and reliability of the prediction and apply them to real-time monitoring systems.
- Conference Article
56
- 10.2118/196011-ms
- Sep 23, 2019
The rapid development of machine learning algorithms and the massive accumulation of well data from continuous monitoring has enabled new applications in the oil and gas industries. Data gathered from well sensors are a foundation of the oilfield digitization and data-driven analysis. Here, we describe a deep learning approach to predict the long-term well performance based on a moderate duration of well monitoring data. In this study, we first developed the data processing procedures for oilfield time series data and determined the proper selection of data sampling frequency, parameter combinations and data structures for deep learning models. Then we explored how Deep Learning (DL) models can be employed for well data analysis and how can we combine physics and DL models. Recurrent Neural Network (RNN) is a type of sequential DL model, which can be utilized for time series data analysis. This approach preserves preceding information and yields current response with memory of prior well behavior. Two candidate RNN models were tried to determine how well they were able to improve the accuracy and stability of well performance estimates. These two methods are Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM). In addition, a novel combination of RNN with Convolutional Neural Networks (CNNs), Long- and Short-term Time-series network (LSTNet), was also investigated. These various models were tested and compared based on the public production datasets from Volve Field. Both GRU and LSTM achieved higher accuracy in performance prediction compared to the simple RNN. In the case of frequent well shut-in and opening, the failure in capturing fast pressure responses and the extreme fluctuations with the simple RNN ultimately leads to high error. In contrast, LSTNet is more stable to frequent or significant well variations. With advanced deep learning structures, engineers can interpret long-term reservoir performance information from responses estimated by deep learning models, instead of performing costly well tests or shut-ins.
- Book Chapter
9
- 10.1016/b978-0-12-819365-5.00013-9
- Jul 24, 2020
- Statistical Process Monitoring using Advanced Data-Driven and Deep Learning Approaches
Chapter 7 - Unsupervised recurrent deep learning scheme for process monitoring
- Research Article
250
- 10.1016/j.aej.2022.01.011
- Jan 6, 2022
- Alexandria Engineering Journal
Comparative analysis of Gated Recurrent Units (GRU), long Short-Term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends
- Research Article
26
- 10.1016/j.eswa.2024.124550
- Jun 25, 2024
- Expert Systems With Applications
Ensemble empirical mode decomposition based deep learning models for forecasting river flow time series
- Research Article
27
- 10.1007/s11227-019-02991-7
- Sep 11, 2019
- The Journal of Supercomputing
Missing values are common in the Internet of Things (IoT) environment for various reasons, including regular maintenance or malfunction. In time-series prediction in the IoT, missing values may have a relationship with the target labels, and their missing patterns result in informative missingness. Thus, missing values can be a barrier to achieving high accuracy of prediction and analysis in data mining in the IoT. Although several methods have been proposed to estimate values that are missing, few studies have investigated the comparison of interpolation methods using conventional and deep learning models. There has thus far been relatively little research into interpolation methods in the IoT environment. To address these problems, this paper presents the use of linear regression, support vector regression, artificial neural networks, and long short-term memory to make time-series predictions for missing values. Finally, a full comparison and analysis of interpolation methods are presented. We believe that these findings can be of value to future work in IoT applications.
- Dissertation
3
- 10.11606/t.3.2021.tde-10082021-160557
- Jun 15, 2021
The artificial intelligence models are considered state of the art in several domains.The deep reinforcement learning models, one of the main categories of artificial intelligence\\'s models, have a high potential for being applied on domains with high complexity, nonlinearities, and the existence of autocorrelation, seasonal and cyclical components,and noise. One highly relevant domain that presents these characteristics is stock markettrading. Recent works were conducted in this domain using deep reinforcement learning. Nevertheless, these did not consider integrating other relevant components such as price time series prediction and market sentiment analysis. Another critical gap is the lack of comparison of different deep reinforcement learning models in different stock trading scenarios. Besides being an important developing market, the Brazilian stock market is one of the 20 biggest markets in the world. A critical problem for all the investors in this stock market is how to improve the strategies and systems used for improving returns, considering their associated risks. This research aims to investigate and propose a system for automatic asset trading considering multiple features, time series prediction, sentiment analysis, and deep reinforcement learning models. The methodology used was a simulation of the market environment simulation, considering one asset and the evaluation of two relevant scenarios. Eight versions of the proposed system were implemented and evaluated, considering six relevant domain metrics and the buy-and-hold strategy, the main baseline model in the literature. For the first scenario, which simulated a cycle with upward and downward trends, the system\\'s configuration that presented the best results used the price prediction component obtained from a recurrent neural network with a maximum order size of 200 stocks. It obtained better results than the baseline model. For the second scenario, which simulated a deep downward trend, all the system configurations presented better results than the baseline model. The configuration using a recurrent neural network for price prediction and a maximum order size of 10 stocks presented the best results. The main contribution of this research for the deep reinforcement learning area was the proposal of a system that uses additional time series analysis and sentiment analysis features extracted with deep learning models. The main contribution of this research for stock market trading was to propose the use of deep reinforcement learning considering as features: market prices, volume traded, technical indicators, and price and market sentiment predictions obtained using deep learning models. The proposed system can be used in different markets and assets and adapted to other sub-domains.
- Conference Article
1
- 10.1109/iccect57938.2023.10140414
- Apr 28, 2023
Because pruritus is often overlooked and undertreated in the clinical setting, a major unmet need is objective measures of behaviors associated with scratching in order to quantify itch severity and frequency since scratch directly correlates to itch. Such methods to measure itch and how itch severity changes over time are needed to objectively study and understand pruritus, develop and assess the efficacy of new medications, quantify disease severity in patients, and monitor treatment response. Wearable sensors in the form of wrist actigraphy, which detects wrist movements over time using micro-accelerometers, are the most studied and tested method to detect scratching events. To address these issues, 7 deep learning models will be used to train and test for scratch detection, including: Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) – Gated Recurrent Unit (GRU), RNN – Long Short-Term Memory (LSTM), CNN & RNN – GRU (end-to-end), CNN & RNN – LSTM (end-to-end), CNN & RNN – GRU (parallel) and CNN & RNN – LSTM (parallel). The final results show accurately detect scratching using deep learning (CNN achieved a high accuracy of 0.996) in various situations and can provide useful information (time, frequency, scratched body part, etc.) regarding the scratching behavior in day and nighttime in order to better quantify pruritus for use in the medical field.
- Research Article
- 10.47514/kjcs/2024.1.3.0011
- Sep 30, 2024
- Kasu Journal of Computer Science
Background: With their volatile prices, cryptocurrencies have become valuable assets in the financial market. Predicting cryptocurrency prices accurately is essential for making well-informed investment decisions. Time series prediction models, like Gated Recurrent Unit (GRU) and Recurrent Neural Networks (RNN), are popular tools for financial data forecasting because they can capture sequential dependencies in data. Aim: This study aims to predict the average monthly closing prices of five major cryptocurrencies—Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Litecoin (LTC), and Ripple (XRP)—using GRU and RNN models and evaluate their performance in forecasting these prices. Method: Time series input sequences were produced and historical price data for the chosen cryptocurrencies were preprocessed using Min-Max Scaling. This data was divided into training and test sets, and it was used to train both the GRU and RNN models. The Mean Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) were used to assess the performance of the model. Results: For the majority of cryptocurrencies, the RNN model exhibited better predicted accuracy and consistently outperformed the GRU model. For instance, the RMSE for Ripple was 0.06 for the RNN model and 0.09 for GRU. In a similar vein, the RNN model outperformed the GRU model with a MAPE of 12.97% for Ethereum. These results imply that RNN models are more suitable for financial forecasting in this sector, as they yield more accurate predictions for cryptocurrency values.