Efficient time series forecasting with gated attention and patched data: A transformer-based approach
Efficient time series forecasting with gated attention and patched data: A transformer-based approach
- Research Article
23
- 10.1016/j.ijforecast.2021.11.011
- Jan 15, 2022
- International Journal of Forecasting
Probabilistic time series forecasting is crucial in many application domains, such as retail, ecommerce, finance, and biology. With the increasing availability of large volumes of data, a number of neural architectures have been proposed for this problem. In particular, Transformer-based methods achieve state-of-the-art performance on real-world benchmarks. However, these methods require a large number of parameters to be learned, which imposes high memory requirements on the computational resources for training such models. To address this problem, we introduce a novel bidirectional temporal convolutional network that requires an order of magnitude fewer parameters than a common Transformer-based approach. Our model combines two temporal convolutional networks: the first network encodes future covariates of the time series, whereas the second network encodes past observations and covariates. We jointly estimate the parameters of an output distribution via these two networks. Experiments on four real-world datasets show that our method performs on par with four state-of-the-art probabilistic forecasting methods, including a Transformer-based approach and WaveNet, on two point metrics (sMAPE and NRMSE) as well as on a set of range metrics (quantile loss percentiles) in the majority of cases. We also demonstrate that our method requires significantly fewer parameters than Transformer-based methods, which means that the model can be trained faster with significantly lower memory requirements, which as a consequence reduces the infrastructure cost for deploying these models.
- Research Article
- 10.55592/cilamce.v6i06.8206
- Dec 2, 2024
- Ibero-Latin American Congress on Computational Methods in Engineering (CILAMCE)
Precipitation represents a critical meteorological phenomenon that exerts a substantial influence on different geographic regions, as well as playing a fundamental role in various human activities. Notably, Rio de Janeiro experiences unstable weather conditions that lead to sudden and intense rainfall. Consequently, forecasting such precipitation patterns, particularly extreme events, is of fundamental importance in mitigating adverse impacts. Artificial Neural Networks (ANNs) present a promising path for predicting time series data, with transformer architectures emerging as an efficient option. Recognized for their versatility across diverse tasks, transformers have demonstrated effectiveness in time series forecasting, with the Autoformer model emerging as a standout performer, achieving state-of-the-art performance levels. However, the computational demands inherent in transformer-based models, including significant time and memory requirements, have led to the exploration of simpler alternatives. Linear models, such as DLinear, have been proposed as computationally efficient alternatives, capable of providing predictive performance comparable or superior to transformers. The objective of this study was to evaluate and contrast the predictive effectiveness of the linear model with the transformer-based approach in precipitation forecasting tasks for data from Rio de Janeiro. The dataset was obtained through the INMET meteorological system, covering historical records from 2002 to 2023, from four meteorological stations distributed in Rio de Janeiro, Brazil. When it comes to precipitation forecasting, the presence of data imbalance, particularly with regard to extreme events characterized by precipitation exceeding 25 mm, represents a significant challenge. In the scope of this work, the dataset was used in unbalanced form. To train the models, the dataset was partitioned into training (60%), validation (20%) and test (20%) subsets. Both models were instantiated with equal parameters, including sequence length and prediction length of 96, batch size of 32, 20 epochs, and utilizing EarlyStopping and ReduceLROnPlateau callbacks with a patience parameter of 3. The mean squared error (MSE) served as the primary metric for optimizing the loss function during training and evaluating predictive performance. Finally, the study seeks to evaluate the quality of models for predicting precipitation in the Rio de Janeiro region. By evaluating meteorological data, the study attempts to contribute to the understanding of models performance in precipitation forecasting tasks. Our analysis sought to demonstrate insights into which architecture to choose when it comes to precipitation with an unbalance dataset.
- Research Article
2
- 10.2196/63962
- Mar 18, 2025
- Journal of medical Internet research
Monitoring the emotional states of patients with psychiatric problems has always been challenging due to the noncontinuous nature of clinical assessments, the effect of the health care environment, and the inherent subjectivity of evaluation instruments. However, mental states in psychiatric disorders exhibit substantial variability over time, making real-time monitoring crucial for preventing risky situations and ensuring appropriate treatment. This study aimed to leverage new technologies and deep learning techniques to enable more objective, real-time monitoring of patients. This was achieved by passively monitoring variables such as step count, patient location, and sleep patterns using mobile devices. We aimed to predict patient self-reports and detect sudden variations in their emotional valence, identifying situations that may require clinical intervention. Data for this project were collected using the Evidence-Based Behavior (eB2) app, which records both passive and self-reported variables daily. Passive data refer to behavioral information gathered via the eB2 app through sensors embedded in mobile devices and wearables. These data were obtained from studies conducted in collaboration with hospitals and clinics that used eB2. We used hidden Markov models (HMMs) to address missing data and transformer deep neural networks for time-series forecasting. Finally, classification algorithms were applied to predict several variables, including emotional state and responses to the Patient Health Questionnaire-9. Through real-time patient monitoring, we demonstrated the ability to accurately predict patients' emotional states and anticipate changes over time. Specifically, our approach achieved high accuracy (0.93) and a receiver operating characteristic (ROC) area under the curve (AUC) of 0.98 for emotional valence classification. For predicting emotional state changes 1 day in advance, we obtained an ROC AUC of 0.87. Furthermore, we demonstrated the feasibility of forecasting responses to the Patient Health Questionnaire-9, with particularly strong performance for certain questions. For example, in question 9, related to suicidal ideation, our model achieved an accuracy of 0.9 and an ROC AUC of 0.77 for predicting the next day's response. Moreover, we illustrated the enhanced stability of multivariate time-series forecasting when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods, such as recurrent neural networks or long short-term memory cells. The stability of multivariate time-series forecasting improved when HMM preprocessing was combined with a transformer model, as opposed to other time-series forecasting methods (eg, recurrent neural network and long short-term memory), leveraging the attention mechanisms to capture longer time dependencies and gain interpretability. We showed the potential to assess the emotional state of a patient and the scores of psychiatric questionnaires from passive variables in advance. This allows real-time monitoring of patients and hence better risk detection and treatment adjustment.
- Research Article
- 10.17973/mmsj.2025_11_2025142
- Nov 12, 2025
- MM Science Journal
Tool wear represents a central challenge for manufacturing companies. The resulting workpiece rejects and machine downtimes cause significant costs. One difficulty lies in predicting the optimal tool change timing. In practice, two suboptimal scenarios occur: Either tools are changed too early, not fully utilizing their service life, or too late, which can result in quality losses or tool breakage. In the context of Industry 4.0 and manufacturing digitalization, large amounts of process data are continuously generated, enabling indirect process control of tool wear. The temporal dependence of process data and the multitude of influencing factors require the development of powerful analysis methods. This paper examines the development of a concept for detecting tool condition using a Transformer-based approach in milling and drilling processes. The captured motor current of the machine axes is analysed. The concept uses implicit labelling of training data, utilizing only sensor signals from unworn tools. The Transformer encoder learns a representation of the unworn machining state, based on which a linear decoder performs time series prediction. The reconstruction error, i.e., the deviation between predicted and actual values, serves as an indicator of tool condition. Statistical parameters of the reconstruction error enable quantitative comparison between normal and worn tool behaviour. Besides presenting the concept, the implementation, development of a suitable model architecture and determination of optimal hyperparameters are addressed.
- Research Article
- 10.36001/phmap.2023.v4i1.3773
- Sep 4, 2023
- PHM Society Asia-Pacific Conference
This study aims to assess the effectiveness of the Transformer-based reconstruction approach for detecting anomalies in time series data. The reconstruction error-based anomaly detection method was applied to both multivariate time series from NASA SMAP/MSL and univariate time series from UCR. Four deep learning models, including Transformer, Dilated CNN, LSTM, and MLP, were compared in terms of their ability to reconstruct input data. Dilated CNN outperformed the other models in almost all experimental results, achieving a 25% higher score than Transformer on the UCR dataset when trained with random masking, and a 60% higher score when trained with middle masking. These results suggest that the Transformer did not perform as well as expected for anomaly detection based on time series reconstruction errors, and its inferiority to Dilated CNN may be attributed to the characteristics of the time series and the limited training data. Future research should focus on developing Transformer models that can better capture the properties of time series data and investigate the relationship between the model’s performance, data volume, and model complexity.
- Research Article
13
- 10.3390/e25020180
- Jan 17, 2023
- Entropy
Anomaly detection in multivariate time series is an important problem with applications in several domains. However, the key limitation of the approaches that have been proposed so far lies in the lack of a highly parallel model that can fuse temporal and spatial features. In this paper, we propose TDRT, a three-dimensional ResNet and transformer-based anomaly detection method. TDRT can automatically learn the multi-dimensional features of temporal-spatial data to improve the accuracy of anomaly detection. Using the TDRT method, we were able to obtain temporal-spatial correlations from multi-dimensional industrial control temporal-spatial data and quickly mine long-term dependencies. We compared the performance of five state-of-the-art algorithms on three datasets (SWaT, WADI, and BATADAL). TDRT achieves an average anomaly detection F1 score higher than 0.98 and a recall of 0.98, significantly outperforming five state-of-the-art anomaly detection methods.
- Research Article
6
- 10.1007/s44230-023-00037-z
- Jul 20, 2023
- Human-Centric Intelligent Systems
The transformer-based approach excels in long-term series forecasting. These models leverage stacking structures and self-attention mechanisms, enabling them to effectively model dependencies in series data. While some approaches prioritize sparse attention to tackle the quadratic time complexity of self-attention, it can limit information utilization. We introduce a creative double-branch attention mechanism that simultaneously captures intricate dependencies in both temporal and variable perspectives. Moreover, we propose query-independent attention, taking into account the near-identical attention allocated by self-attention to different query positions. This enhances efficiency and reduces the impact of redundant information. We integrate the double-branch query-independent attention into popular transformer-based methods like Informer, Autoformer, and Non-stationary transformer. The results obtained from conducting experiments on six practical benchmarks consistently validate that our novel attention mechanism substantially improves the long-term series forecasting performance in contrast to the baseline approach.
- Research Article
7
- 10.3389/fmars.2024.1374902
- Apr 10, 2024
- Frontiers in Marine Science
Accurate significant wave height (SWH) forecasting is essential for various marine activities. While traditional numerical and mathematical-statistical methods have made progress, there is still room for improvement. This study introduces a novel transformer-based approach called the 2D-Geoformer to enhance SWH forecasting accuracy. The 2D-Geoformer combines the spatial distribution capturing capabilities of SWH numerical models with the ability of mathematical-statistical methods to identify intrinsic relationships among datasets. Using a comprehensive long time series of SWH numerical hindcast datasets as the numerical forecasting database and ERA5 reanalysis SWH datasets as the observational proxies database, with a focus on a 72-hour forecasting window, the 2D-Geoformer is designed. By training the potential connections between SWH numerical forecasting fields and forecasting errors, we can retrieve SWH forecasting errors for each numerical forecasting case. The corrected forecasting results can be obtained by subtracting the retrieved SWH forecasting errors from the original numerical forecasting fields. During long-term validation periods, this method consistently and effectively corrects numerical forecasting errors for almost every case, resulting in a significant reduction in root mean square error compared to the original numerical forecasting fields. Further analysis reveals that this method is particularly effective for numerical forecasting fields with higher errors compared to those with relatively smaller errors. This integrated approach represents a substantial advancement in SWH forecasting, with the potential to improve the accuracy of operational SWH forecasts. The 2D-Geoformer combines the strengths of numerical models and mathematical-statistical methods, enabling better capture of spatial distributions and intrinsic relationships in the data. The method's effectiveness in correcting numerical forecasting errors, particularly for cases with higher errors, highlights its potential for enhancing SWH forecasting accuracy in operational settings.
- Research Article
- 10.71465/fbf420
- Oct 25, 2025
- Frontiers in Business and Finance
The accurate and timely estimation of option Greeks remains a critical challenge in financial risk management, particularly during periods of extreme market volatility when traditional computational methods encounter severe limitations in both speed and reliability. This paper presents a novel application of Transformer-based deep learning architectures to the problem of real-time option Greeks estimation under extreme market conditions, addressing fundamental challenges that have constrained conventional approaches including computational bottlenecks, numerical instability, and inadequate handling of long-range temporal dependencies in volatility dynamics. We develop a specialized attention mechanism that exploits the structural properties of option surfaces while maintaining computational efficiency through strategic architectural design incorporating multi-head self-attention, gated neural network mechanisms that enforce economic rationality constraints, and positional encoding adapted for financial time series exhibiting non-stationary behavior. The empirical investigation employs comprehensive datasets spanning multiple market regimes including the 2008 financial crisis characterized by VIX levels exceeding 80 percent as documented in detailed intraday records, the August 2015 volatility spike reaching 53 percent, and the March 2020 COVID-19 pandemic market disruption with VIX peaking at 89.53 percent, providing robust assessment across diverse stress scenarios that reveal the catastrophic failure modes of traditional methods. Our Transformer-based approach achieves Delta estimation accuracy with Mean Absolute Error below 0.001 for at-the-money options during normal market conditions and maintains stable performance with MAE below 0.002 during extreme volatility events where traditional finite difference methods exhibit errors exceeding 0.05, representing more than twentyfivefold improvement in accuracy under stress conditions. The architecture leverages a stratified training strategy that oversamples extreme volatility regimes by factors exceeding thirteen times their natural occurrence frequency, ensuring robust generalization to crisis scenarios despite their rarity in historical data comprising less than two percent of trading days. Furthermore, the architecture delivers inference latency below 100 microseconds per option contract on modern GPU hardware, enabling genuine real-time Greeks calculation for large portfolios containing thousands of positions that require continuous hedging adjustments as volatility surfaces shift rapidly during market stress. This research establishes Transformer models as a transformative methodology for derivatives risk management, offering practitioners a robust tool for maintaining accurate hedge ratios and risk metrics even during the most turbulent market periods when precise Greeks estimation proves most critical for portfolio survival.
- Conference Article
- 10.1109/ijcnn64981.2025.11229278
- Jun 30, 2025
SAT: A Sparse Attention Transformer-based Approach for Predicting Market Risk Factors Using Stock Time Series Data
- Research Article
3
- 10.70023/sahd/250208
- Feb 24, 2025
- PatternIQ Mining
Sophisticated prediction models are needed in intelligent settings to improve system efficiency, security, and customer satisfaction. This study on transformer-based time-lapse forecasting models for intelligent prediction covers numerous occurrences, including energy management, enhanced assistance for driver infrastructure, and in-car technologies. The recommended method identifies permanent correlations and complex temporal patterns in multi-feature collections employing self-focus. The Transformer's topology is developed to forecast time series. The information set is ready for abnormality recognition, event estimation, and trend assessment by cleaning and classifying the event types with different activity rates. The simulation of the Transformer receives the actual data set. Important findings reveal that the Transformer-based approach forecasts consecutive network configurations more accurately and consumes less computational resources than conventional techniques. The model's capabilities for identifying outliers and adjusting event distributions promote adaptive ambient decision-making price; subsequently, the Transformers technique lays the groundwork towards AI prediction, particularly improving sophisticated systems' capacity to interpret the meaning of complex influenced by events stream of information.
- Conference Article
1
- 10.1109/imtic58887.2023.10178457
- May 10, 2023
A Transformer-based approach for Fake News detection using Time Series Analysis
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.