Generative Neural Networks for Data Imputation in Longitudinal Epidemiological Studies.
Longitudinal epidemiological studies often face challenges with incomplete follow-up and missing data, which can bias results and reduce statistical power. Conventional imputation methods may not adequately capture the complex patterns and dependencies in such multivariate time series data. While more recently developed generative machine learning models offer improved solutions, few methods are available which can handle inconsistently spaced intervals between measurements across long time periods and completely missing time steps, char acteristics which are common in real-world studies evaluating long-term health outcomes. This paper introduces a variational autoencoder-based generative neural network designed for imputing partially and fully missing informa tion in irregular time series with extensive missingness. Our approach exploits both correlations between features at a single time step and trends of the same feature over time to reconstruct missing values. Experiments on synthetic data designed to resemble the characteristics of longitudinal epidemiological studies and a case study on a real-world dataset demonstrate the effectiveness of our approach. We show superior performance and parameter stability across varying degrees of missingness and missingness patterns compared to prior work.
- Conference Article
22
- 10.1109/icdm.2005.109
- Nov 27, 2005
Multivariate time series (MTS) data sets are common in-various multimedia, medical and financial application domains. These applications perform several data-analysis operations on large number of MTS data sets such as similarity searches, feature-subset-selection, clustering and classifications. Correlation-based techniques, such as principal component analysis (PCA), have proven to improve the efficiency of many of the above-mentioned data-analysis operations on MTS, which implies that the correlation coefficients concisely represent the original MTS data. However, if the statistical properties (e.g., variance) of MTS data change over time dimension, i.e., MTS data is non-stationary, the correlation coefficients are not stable. In this paper, we propose to utilize the stationarity of the MTS data sets, in order to represent the original MTS data more stably, as well as concisely with the correlation coefficients. That is, before performing any correlation-based data analysis, we first executes the stationarity test to decide whether the MTS data is stationary or not, i.e., whether the correlation is stable or not. Subsequently, for a non-stationary MTS data set, we difference it to render the data set stationary. Even though our approach is general, to focus the discussion we describe our approach within the context of our previously proposed technique for MTS similarity search. In order to show the validity of our approach, we performed several experiments on four real-world data sets. The results show that the performance of our similarity search technique have significantly improved in terms of precision/recall.
- Research Article
1
- 10.1007/s11704-024-40449-z
- Jan 13, 2025
- Frontiers of Computer Science
Multivariate time series (MTS) data are vital for various applications, particularly in machine learning tasks. However, challenges such as sensor failures can result in irregular and misaligned data with missing values, thereby complicating their analysis. While recent advancements use graph neural networks (GNNs) to manage these Irregular Multivariate Time Series (IMTS) data, they generally require a reliable graph structure, either pre-existing or inferred from adequate data to properly capture node correlations. This poses a challenge in applications where IMTS data are often streamed and waiting for future data to estimate a suitable graph structure becomes impractical. To overcome this, we introduce a dynamic GNN model suited for streaming characteristics of IMTS data, incorporating an instance-attention mechanism that dynamically learns and updates graph edge weights for realtime analysis. We also tailor strategies for high-frequency and low-frequency data to enhance prediction accuracy. Empirical results on real-world datasets demonstrate the superiority of our proposed model in both classification and imputation tasks.
- Conference Article
4
- 10.1109/icdmw53433.2021.00132
- Dec 1, 2021
In recent years, there has been an ever increasing amount of multivariate time series (MTS) data in various domains, typically generated by a large family of sensors such as wearable devices. This has led to the development of novel learning methods on MTS data, with deep learning models dominating the most recent advancements. Prior literature has primarily focused on designing new network architectures for modeling temporal dependencies within MTS. However, a less studied challenge is associated with high dimensionality of MTS data. In this paper, we propose a novel neural component, namely Neural Feature Selector (NFS), as an end-2-end solution for feature selection in MTS data. Specifically, NFS is based on decomposed convolution design and includes two modules: firstly each feature stream <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> within MTS is processed by a temporal CNN independently; then an aggregating CNN combines the processed streams to produce input for other downstream networks. We evaluated the proposed NFS model on four real-world MTS datasets and found that it achieves comparable results with state-of-the-art methods while providing the benefit of feature selection. Our paper also highlights the robustness and effectiveness of feature selection with NFS compared to using recent autoencoder-based methods.
- Research Article
6
- 10.1371/journal.pone.0062974
- May 7, 2013
- PLoS ONE
Molecular phenotyping technologies (e.g., transcriptomics, proteomics, and metabolomics) offer the possibility to simultaneously obtain multivariate time series (MTS) data from different levels of information processing and metabolic conversions in biological systems. As a result, MTS data capture the dynamics of biochemical processes and components whose couplings may involve different scales and exhibit temporal changes. Therefore, it is important to develop methods for determining the time segments in MTS data, which may correspond to critical biochemical events reflected in the coupling of the system’s components. Here we provide a novel network-based formalization of the MTS segmentation problem based on temporal dependencies and the covariance structure of the data. We demonstrate that the problem of partitioning MTS data into segments to maximize a distance function, operating on polynomially computable network properties, often used in analysis of biological network, can be efficiently solved. To enable biological interpretation, we also propose a breakpoint-penalty (BP-penalty) formulation for determining MTS segmentation which combines a distance function with the number/length of segments. Our empirical analyses of synthetic benchmark data as well as time-resolved transcriptomics data from the metabolic and cell cycles of Saccharomyces cerevisiae demonstrate that the proposed method accurately infers the phases in the temporal compartmentalization of biological processes. In addition, through comparison on the same data sets, we show that the results from the proposed formalization of the MTS segmentation problem match biological knowledge and provide more rigorous statistical support in comparison to the contending state-of-the-art methods.
- Research Article
- 10.36001/ijphm.2025.v16i2.4571
- Nov 20, 2025
- International Journal of Prognostics and Health Management
Multivariate time series (MTS) data is widely utilized in industrial manufacturing, equipment maintenance, and health monitoring. However, the high dimensionality, dynamic nature, and heterogeneity characteristics bring significant challenges for modeling. Traditional deep learning algorithms based on sequential modeling struggle to capture the complex structural relationships between different time series variables, making it difficult to uncover interaction patterns and potential dependencies. To address the dynamic and complex dependencies among variables in MTS data and further balance the importance distribution across multiple temporal feature channels, this work proposes a channel-aware multi-scale adaptive graph interaction network (CMAGIN) for MTS forecasting. The proposed framework integrates a dynamic and adaptive graph constructor with local awareness and global attention (DAGC-LAGA) and a channel-wise adaptive center enhancement (CACE) mechanism. The design of DAGC-LAGA captures sparse neighborhood relations through a multi-view local dynamic graph constructor and further leverages a global attention graph enhancer to model semantic correlations. The results effectively display dynamic dependencies among variables. The introduction of the CACE module dynamically enhances key node features by calculating the node importance at the channel level. In addition, applying the centrality-aware attention mechanism improves the sensitivity of the model to crucial temporal patterns. Furthermore, the results are verified via the C-MAPSS dataset for aircraft engine degradation prediction. Experimental results demonstrate that the CMAGIN model outperforms comparative methods in both RMSE and Score metrics, and exhibits robust performance under complex operating conditions and multiple-fault scenarios. Future research could investigate scalable applications of CMAGIN across diverse industrial scenarios to enable field deployment of intelligent operation and maintenance systems.
- Research Article
1
- 10.1016/j.ins.2023.119872
- Nov 8, 2023
- Information Sciences
Matrix-based vs. vector-based linear discriminant analysis: A comparison of regularized variants on multivariate time series data
- Research Article
51
- 10.5194/cp-10-107-2014
- Jan 16, 2014
- Climate of the Past
Abstract. Paleoclimate time series are often irregularly sampled and age uncertain, which is an important technical challenge to overcome for successful reconstruction of past climate variability and dynamics. Visual comparison and interpolation-based linear correlation approaches have been used to infer dependencies from such proxy time series. While the first is subjective, not measurable and not suitable for the comparison of many data sets at a time, the latter introduces interpolation bias, and both face difficulties if the underlying dependencies are nonlinear. In this paper we investigate similarity estimators that could be suitable for the quantitative investigation of dependencies in irregular and age-uncertain time series. We compare the Gaussian-kernel-based cross-correlation (gXCF, Rehfeld et al., 2011) and mutual information (gMI, Rehfeld et al., 2013) against their interpolation-based counterparts and the new event synchronization function (ESF). We test the efficiency of the methods in estimating coupling strength and coupling lag numerically, using ensembles of synthetic stalagmites with short, autocorrelated, linear and nonlinearly coupled proxy time series, and in the application to real stalagmite time series. In the linear test case, coupling strength increases are identified consistently for all estimators, while in the nonlinear test case the correlation-based approaches fail. The lag at which the time series are coupled is identified correctly as the maximum of the similarity functions in around 60–55% (in the linear case) to 53–42% (for the nonlinear processes) of the cases when the dating of the synthetic stalagmite is perfectly precise. If the age uncertainty increases beyond 5% of the time series length, however, the true coupling lag is not identified more often than the others for which the similarity function was estimated. Age uncertainty contributes up to half of the uncertainty in the similarity estimation process. Time series irregularity contributes less, particularly for the adapted Gaussian-kernel-based estimators and the event synchronization function. The introduced link strength concept summarizes the hypothesis test results and balances the individual strengths of the estimators: while gXCF is particularly suitable for short and irregular time series, gMI and the ESF can identify nonlinear dependencies. ESF could, in particular, be suitable to study extreme event dynamics in paleoclimate records. Programs to analyze paleoclimatic time series for significant dependencies are included in a freely available software toolbox.
- Research Article
560
- 10.1609/aaai.v33i01.33011409
- Jul 17, 2019
- Proceedings of the AAAI Conference on Artificial Intelligence
Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-ofthe-art baseline methods.
- Research Article
10
- 10.1609/aaai.v37i6.25876
- Jun 26, 2023
- Proceedings of the AAAI Conference on Artificial Intelligence
Real-world applications often involve irregular time series, for which the time intervals between successive observations are non-uniform. Irregularity across multiple features in a multi-variate time series further results in a different subset of features at any given time (i.e., asynchronicity). Existing pre-training schemes for time-series, however, often assume regularity of time series and make no special treatment of irregularity. We argue that such irregularity offers insight about domain property of the data—for example, frequency of hospital visits may signal patient health condition—that can guide representation learning. In this work, we propose PrimeNet to learn a self-supervised representation for irregular multivariate time-series. Specifically, we design a time sensitive contrastive learning and data reconstruction task to pre-train a model. Irregular time-series exhibits considerable variations in sampling density over time. Hence, our triplet generation strategy follows the density of the original data points, preserving its native irregularity. Moreover, the sampling density variation over time makes data reconstruction difficult for different regions. Therefore, we design a data masking technique that always masks a constant time duration to accommodate reconstruction for regions of different sampling density. We learn with these tasks using unlabeled data to build a pre-trained model and fine-tune on a downstream task with limited labeled data, in contrast with existing fully supervised approach for irregular time-series, requiring large amounts of labeled data. Experiment results show that PrimeNet significantly outperforms state-of-the-art methods on naturally irregular and asynchronous data from Healthcare and IoT applications for several downstream tasks, including classification, interpolation, and regression.
- Research Article
19
- 10.1016/s0378-3758(02)00461-5
- Nov 15, 2002
- Journal of Statistical Planning and Inference
Information complexity criteria for detecting influential observations in dynamic multivariate linear models using the genetic algorithm
- Research Article
283
- 10.1016/j.neucom.2021.02.046
- Mar 3, 2021
- Neurocomputing
A review of irregular time series data handling with gated recurrent neural networks
- Research Article
6
- 10.1109/jbhi.2024.3395446
- Jul 1, 2024
- IEEE journal of biomedical and health informatics
The real-world Electronic Health Records (EHRs) present irregularities due to changes in the patient's health status, resulting in various time intervals between observations and different physiological variables examined at each observation point. There have been recent applications of Transformer-based models in the field of irregular time series. However, the full attention mechanism in Transformer overly focuses on distant information, ignoring the short-term correlations of the condition. Thereby, the model is not able to capture localized changes or short-term fluctuations in patients' conditions. Therefore, we propose a novel end-to-end Deformable Neighborhood Attention Transformer (DNA-T) for irregular medical time series. The DNA-T captures local features by dynamically adjusting the receptive field of attention and aggregating relevant deformable neighborhoods in irregular time series. Specifically, we design a Deformable Neighborhood Attention (DNA) module that enables the network to attend to relevant neighborhoods by drifting the receiving field of neighborhood attention. The DNA enhances the model's sensitivity to local information and representation of local features, thereby capturing the correlation of localized changes in patients' conditions. We conduct extensive experiments to validate the effectiveness of DNA-T, outperforming existing state-of-the-art methods in predicting the mortality risk of patients. Moreover, we visualize an example to validate the effectiveness of the proposed DNA.
- Conference Article
3
- 10.1109/ickg52313.2021.00027
- Dec 1, 2021
Few clustering methods show good performance on multivariate time series (MTS) data. Traditional methods rely too much on similarity measures and perform poorly on the MTS data with complex structures. This paper proposes an MTS clustering algorithm based on graph embedding called MTSC-GE to improve the performance of MTS clustering. MTSC-GE can map MTS samples to the feature representations in a low-dimensional space and then cluster them. While mining the information of the samples themselves, MTSC-GE builds the whole time series data into a graph, paying attention to the connections between samples from an overall perspective and discovering the local structural feature of MTS data. The proposed MTSC-G E consists of three stages. The first stage builds a graph using the original dataset, where each of the MTS samples is regarded as a node in the graph. The second stage uses the graph embedding technique to obtain a new representation of each node. Finally, MTSC-G E uses the K - Means algorithm to cluster based on the newly obtained representation. We compare MTSC-GE with six state-of-the-art benchmark methods on five public datasets, experimental results show that MTSC-GE has achieved good performance.
- Research Article
26
- 10.1109/tkde.2022.3218803
- Jan 1, 2022
- IEEE Transactions on Knowledge and Data Engineering
Multi-variate time series (MTS) data is a ubiquitous class of data abstraction in the real world. Any instance of MTS is generated from a hybrid dynamical system and their specific dynamics are usually unknown. The hybrid nature of such a dynamical system is a result of complex external attributes, such as geographic location and time of day, each of which can be categorized into either spatial attributes or temporal attributes. Therefore, there are two fundamental views which can be used to analyze MTS data, namely the spatial view and the temporal view. Moreover, from each of these two views, we can partition the set of data samples of MTS into disjoint forecasting tasks in accordance with their associated attribute values. Then, samples of the same task will manifest similar forthcoming pattern, which is less sophisticated to be predicted in comparison with the original single-view setting. Considering this insight, we propose a novel multi-view multi-task (MVMT) learning framework for MTS forecasting. Instead of being explicitly presented in most scenarios, MVMT information is deeply concealed in the MTS data, which severely hinders the model from capturing it naturally. To this end, we develop two kinds of basic operations, namely task-wise affine transformation and task-wise normalization, respectively. Applying these two operations with prior knowledge on the spatial and temporal view allows the model to adaptively extract MVMT information while predicting. Extensive experiments on three datasets are conducted to illustrate that canonical architectures can be greatly enhanced by the MVMT learning framework in terms of both effectiveness and efficiency. In addition, we design rich case studies to reveal the properties of representations produced at different phases in the entire prediction procedure.
- Research Article
- 10.1016/j.ins.2024.121233
- Jul 22, 2024
- Information Sciences
Efficient semi-supervised clustering with pairwise constraint propagation for multivariate time series
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.