A Multimodal Approach to Improve the Prediction Capabilities of Deep Learning Models for Multivariate Time Series—A Case of Subtropical Urban Air Quality
Environmental engineering plays a critical role in managing air quality services, which are of daily concern to the public, particularly as climate change alters the factors affecting air quality. Within this context, our study introduces a comprehensive approach that emphasizes predictive models relying on multivariate time-series data. By integrating data from various sources and modalities, we propose a multimodal deep learning method to enhance traditional unimodal models. This study includes a review of existing literature, the preparation of relevant datasets, the development of robust models, and extensive evaluations. The experiments feature a case study focused on air quality services in a subtropical city, aiming to provide insights for improving prediction models. The integrated multimodal approach offers a better understanding of environmental conditions by combining data from automatic air quality monitors, meteorological stations, the European Centre for Medium-Range Weather Forecasts reanalysis data, as well as public welfare information and societal disruption reports. The analysis also considers weather-related alerts, such as typhoon and rainstorm warnings, which lead to school closures and city-wide suspensions. The model incorporates emission sources and upwind areas. Preliminary causality tests confirm that augmented feature space to encompass upstream areas enhances the model analytical capability. Downstream pollution and environmental conditions are significantly influenced by socio-economic activities in upwind areas. Granger causality and Diebold-Mariano tests highlight the importance of public welfare information and societal disruption reports, addressing a critical gap in this field.
- Book Chapter
4
- 10.1007/978-3-031-33374-3_7
- Jan 1, 2023
This paper studies an effective unsupervised deep learning model for multivariate time series anomaly detection. Since multivariate time series usually have problems of insufficient labeling and highly-complex temporal correlation, effectively detecting anomalies in multivariate time series data is particularly challenging. To solve this problem, we propose a model named Wasserstein-GAN with gradient Penalty and effective Scoring (WPS). In this model, Wasserstein Distance with Gradient Penalty helps to capture the data regularities between generator output and real data, thus improving the training stability. Meanwhile, an effective scoring function that consists of reconstruction error, discrimination error, and prediction error is designed to evaluate the accuracy of the abnormal prediction and recall. The experimental results show that compared with the suboptimal baseline model, our proposed WPS obtains 17.68% and 10.41% improvement in prediction precision and F1 score, respectively.
- Research Article
- 10.1016/j.neunet.2025.107922
- Dec 1, 2025
- Neural networks : the official journal of the International Neural Network Society
JCCMTM: Joint channel-independent and channel-dependent strategy for masked multivariate time-series modeling.
- Research Article
48
- 10.1016/j.envres.2020.110214
- Sep 15, 2020
- Environmental Research
Short-term effects of air pollution on cause-specific mental disorders in three subtropical Chinese cities
- Research Article
1
- 10.7717/peerj-cs.2172
- Jul 31, 2024
- PeerJ Computer Science
Multivariate time series anomaly detection is a crucial data mining technique with a wide range of applications in areas such as IT applications. Currently, the majority of anomaly detection methods for time series data rely on unsupervised approaches due to the rarity of anomaly labels. However, in real-world scenarios, obtaining a limited number of anomaly labels is feasible and affordable. Effective usage of these labels can offer valuable insights into the temporal characteristics of anomalies and play a pivotal role in guiding anomaly detection efforts. To improve the performance of multivariate time series anomaly detection, we proposed a novel deep learning model named EDD (Encoder-Decoder-Discriminator) that leverages limited anomaly samples. The EDD model innovatively integrates a graph attention network with long short term memory (LSTM) to extract spatial and temporal features from multivariate time series data. This integrated approach enables the model to capture complex patterns and dependencies within the data. Additionally, the model skillfully maps series data into a latent space, utilizing a carefully crafted loss function to cluster normal data tightly in the latent space while dispersing abnormal data randomly. This innovative design results in distinct probability distributions for normal and abnormal data in the latent space, enabling precise identification of anomalous data. To evaluate the performance of our EDD model, we conducted extensive experimental validation across three diverse datasets. The results demonstrate the significant superiority of our model in multivariate time series anomaly detection. Specifically, the average F1-Score of our model outperformed the second-best method by 2.7% and 73.4% in both evaluation approaches, respectively, highlighting its superior detection capabilities. These findings validate the effectiveness of our proposed EDD model in leveraging limited anomaly samples for accurate and robust anomaly detection in multivariate time series data.
- Research Article
21
- 10.1016/j.inffus.2024.102255
- Jan 14, 2024
- Information Fusion
The detection of anomalies in multivariate time series data is crucial for various practical applications, including smart power grids, traffic flow forecasting, and industrial process control. However, real-world time series data is usually not well-structured, posting significant challenges to existing approaches: (1) The existence of missing values in multivariate time series data along variable and time dimensions hinders the effective modeling of interwoven spatial and temporal dependencies, resulting in important patterns being overlooked during model training; (2) Anomaly scoring with irregularly-sampled observations is less explored, making it difficult to use existing detectors for multivariate series without fully-observed values. In this work, we introduce a novel framework called GST-Pro, which utilizes a graph spatiotemporal process and anomaly scorer to tackle the aforementioned challenges in detecting anomalies on irregularly-sampled multivariate time series. Our approach comprises two main components. First, we propose a graph spatiotemporal process based on neural controlled differential equations. This process enables effective modeling of multivariate time series from both spatial and temporal perspectives, even when the data contains missing values. Second, we present a novel distribution-based anomaly scoring mechanism that alleviates the reliance on complete uniform observations. By analyzing the predictions of the graph spatiotemporal process, our approach allows anomalies to be easily detected. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods, regardless of whether there are missing values present in the data. Our code is available: https://github.com/huankoh/GST-Pro.
- Research Article
30
- 10.1016/j.envpol.2020.115794
- Oct 12, 2020
- Environmental Pollution
Direct and cross impacts of upwind emission control on downwind PM2.5 under various NH3 conditions in Northeast Asia
- Research Article
31
- 10.1002/wics.1550
- Feb 7, 2021
- Wiley interdisciplinary reviews. Computational statistics
Second‐order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high‐dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high‐dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source signals from an observed signal mixture. The SOS model assumes that the observed time series (signals) is a linear mixture of latent time series (sources) with uncorrelated components. The methods make use of the second‐order statistics—hence the name “second‐order source separation.” In this review, we discuss the classical SOS methods and their extensions to more complex settings. An example illustrates how SOS can be performed.This article is categorized under:Statistical Models > Time Series ModelsStatistical and Graphical Methods of Data Analysis > Dimension ReductionData: Types and Structure > Time Series, Stochastic Processes, and Functional Data
- Research Article
32
- 10.1061/(asce)ps.1949-1204.0000553
- May 12, 2021
- Journal of Pipeline Systems Engineering and Practice
Pipe material and labor costs constitute about 70% of pipeline construction costs. Pipe and labor costs are subject to considerable fluctuations over time. These fluctuations are problematic for cost estimation and bid preparation in pipeline projects, which are mostly large and long-term projects. The accurate prediction of pipe and labor costs is invaluable for cost estimators to prepare accurate bids and manage the cost contingencies. However, the existing literature does not take advantage of the leading indicators of pipeline construction cost time series to accurately forecast cost fluctuations in pipeline projects. The objective of this research is to identify the leading indicators of pipeline construction costs and develop multivariate time series models for forecasting cost fluctuations in pipeline projects. Nineteen potential leading indicators of pipe and labor costs were initially selected based on a comprehensive review of construction cost forecasting literature. The leading indicators were identified from this pool of potential leading indicators based on unit root tests and Granger causality tests. Multivariate time series models were developed based on the results of cointegration tests. Vector error correction (VEC) models were developed for the cointegrated variables, while vector autoregressive (VAR) models were developed for the non-cointegrated variables. Since multivariate time series models include information from the identified leading indicators, multivariate time series models are often expected to deliver more accurate forecasts than univariate time series models. The forecasting accuracies of multivariate time series models were compared with those of univariate time series models based on three common error measures: mean absolute prediction error (MAPE), root-mean-squared error (RMSE), and mean average error (MAE). The results show that multivariate time series models outperform univariate models for forecasting cost fluctuations in pipeline projects. The findings of this research contribute to the state of knowledge by identifying leading indicators of pipe and labor costs and developing multivariate time series models to forecast them. The multivariate time series models with leading indicators are more accurate than univariate models for forecasting cost fluctuations in pipeline projects. It is expected that the proposed multivariate time series forecasting models contribute to the enhancement of the theory and practice of pipeline construction cost forecasting and help cost engineers and investment planners to prepare more accurate bids, cost estimates, and budgets for pipeline projects.
- Research Article
330
- 10.1109/tkde.2019.2954510
- Dec 5, 2019
- IEEE Transactions on Knowledge and Data Engineering
Air quality forecasting has been regarded as the key problem of air pollution early warning and control management. In this article, we propose a novel deep learning model for air quality (mainly PM2.5) forecasting, which learns the spatial-temporal correlation features and interdependence of multivariate air quality related time series data by hybrid deep learning architecture. Due to the nonlinear and dynamic characteristics of multivariate air quality time series data, the base modules of our model include one-dimensional Convolutional Neural Networks (1D-CNNs) and Bi-directional Long Short-term Memory networks (Bi-LSTM). The former is to extract the local trend features and spatial correlation features, and the latter is to learn spatial-temporal dependencies. Then we design a jointly hybrid deep learning framework based on one-dimensional CNNs and Bi-LSTM for shared representation features learning of multivariate air quality related time series data. We conduct extensive experimental evaluations using two real-world datasets, and the results show that our model is capable of dealing with PM2.5 air pollution forecasting with satisfied accuracy.
- Research Article
123
- 10.1016/j.neuroimage.2009.12.110
- Jan 7, 2010
- NeuroImage
A new Kalman filter approach for the estimation of high-dimensional time-variant multivariate AR models and its application in analysis of laser-evoked brain potentials
- Research Article
- 10.1108/ijwis-04-2024-0119
- Dec 26, 2024
- International Journal of Web Information Systems
PurposeThe proposed model aims to tackle the data quality issues in multivariate time series caused by missing values. It preserves data set integrity by accurately imputing missing data, ensuring reliable analysis outcomes.Design/methodology/approachThe Conv-DMSA model employs a combination of self-attention mechanisms and convolutional networks to handle the complexities of multivariate time series data. The convolutional network is adept at learning features across uneven time intervals through an imputation feature map, while the Diagonal Mask Self-Attention (DMSA) block is specifically designed to capture time dependencies and feature correlations. This dual approach allows the model to effectively address the temporal imbalance, feature correlation and time dependency challenges that are often overlooked in traditional imputation models.FindingsExtensive experiments conducted on two public data sets and a real project data set have demonstrated the adaptability and effectiveness of the Conv-DMSA model for imputing missing data. The model outperforms baseline methods by significantly reducing the Root Mean Square Error (RMSE) metric, showcasing its superior performance. Specifically, Conv-DMSA has been found to reduce RMSE by 37.2% to 63.87% compared to other models, indicating its enhanced accuracy and efficiency in handling missing data in multivariate time series.Originality/valueThe Conv-DMSA model introduces a unique combination of convolutional networks and self-attention mechanisms to the field of missing data imputation. Its innovative use of a diagonal mask within the self-attention block allows for a more nuanced understanding of the data’s temporal and relational aspects. This novel approach not only addresses the existing shortcomings of conventional imputation methods but also sets a new standard for handling missing data in complex, multivariate time series data sets. The model’s superior performance and its capacity to adapt to varying levels of missing data make it a significant contribution to the field.
- Conference Article
3
- 10.1109/mdm52706.2021.00017
- Jun 1, 2021
Fault prediction is critically important for many mobile equipments such as vehicles, ships and spacecrafts. Sensors deployed on these equipments continuously collect the status data, which are usually multivariate time series data. It is challenging to accurately predict the failure of the equipments based on the generated multivairate time series due to the complex correlations among the variables and the dynamic operation conditions. Though many methods have been proposed, they are not effective to provide an interpretable and accurate fault prediction result. This paper proposes a two-stage Interpretable Fault Prediction method based on Anomaly Detection and Anomaly Accumulation, called IFP-ADAC. Specially, we first design an anomaly detection module based on Generative Adversarial Nets due to the lack of samples. The generator captures the correlations among multiple variables and the temporal dependency within each variable jointly. Second, we design an anomaly accumulation model based on LSTM to capture the anomaly growth pattern, and the attention mechanism has been introduced to consider the severity of the detected anomalies. Compared with the end- to-end methods, our two-stage fault prediction method based on anomaly detection and accumulation has better interpretability. Extensive experiments conducted on two real-world datasets show the superior performance of our method.
- Preprint Article
- 10.5194/egusphere-egu21-15389
- Mar 4, 2021
<p>The WMO World Weather Research Programme (WWRP) “promotes international and interdisciplinary research for more accurate and reliable forecasts from minutes to seasons, expanding the frontiers of weather science to enhance society’s resilience to high-impact weather and the value of weather information for users. In the 2016-2023 WWRP implementation plan, activities focus on 4 challenges: High-Impact Weather, Water, Urbanization, Evolving technologies. Furthermore, the WMO Global Atmosphere Watch Urban Research Meteorology and Environment (GURME) focus on the development of models and associated research activities to enhance the capabilities in providing urban-environmental forecasting and air quality services, illustrating the linkages between meteorology and air quality (https://public.wmo.int/en/programmes).</p><p>This talk presents an international Research Demonstration Project (RDP), that will focus on international research on scientific urban issues addressed by both WWRP and GURME. The strategic objective of this RDP is to focus on the Olympic Games of Paris in 2024 in order to advance research on the theme of the “future Meteorological Forecasting systems at 100m (or finer) resolution for urban areas”. Such systems would prefigure the numerical weather prediction at the horizon 2030. The focus will be on themes related to extreme weather events in summer which both are influenced by and impacts urbanization: thunderstorms and strong Urban Heat Islands, and their consequences.</p><p>There are 5 scientific questions that will be addressed during this Paris RDP:</p><ul><li>Nowcasting & Numerical Weather Prediction in cities at order 100m resolution</li> <li>High resolution thunderstorm nowcasting (probabilistic and deterministic) in the urban environment,  Urban heat islands, cool areas and air quality</li> <li>Nowcasting and forecast in coastal cities (for the Marseilles site)</li> <li>How to improve and better use observational networks in urban areas, including (big) non-conventional data</li> <li>Conception and Communication of tailored weather, climate, environmental information at infra-urban resolution.</li> </ul><p>Several High-Impact weather case studies were selected. Storm cases (starting with one the 10th July 2017) will allow to evaluate the role of the urban area on their enhancement. Extreme Heat wave aggravated by a strong Urban Heat Island are also studied (July 2019). Open urban data describing the agglomerations at very high resolution are provided. New innovative methods to produce maps of urban form characteristics (e.g. from street images) and meteorological data (from personal meteorological stations) will be explored.</p><p>This talk will describe these scientific questions, as well as the common methodology approach that is being discussed within the partners. A focus will be the international experimental campaign that will take place in 2022 over the Paris agglomeration, with an Intensive Observation Period in the summer 2022. Interactions between urban surface and the atmospheric boundary layer, the interactions between air quality and aerosols between city and biogenic plumes, and the local effect of urban trees on micro-climate and chemistry are some of the axes of the campaign. It will provide additional meteorological and air quality observations, to both help to improve the nowcasting and NWP systems at urban scale, and aim to define the required additional instrumentation that should be deployed during the Olympics games themselves.</p>
- Preprint Article
1
- 10.5194/ems2024-723
- Aug 16, 2024
Science communication and dissemination play a key role in research projects aimed at addressing critical societal challenges such as climate change and air pollution. By effectively conveying scientific knowledge to diverse audiences, these efforts contribute to transformative change and foster public awareness and engagement.An essential aspect of effective science communication is the strategic selection of communication formats and channels tailored to the needs and preferences of different audiences. In this poster we would like to pay special attention to the role of innovative approaches such as storytelling, art, and visual elements in communication and engagement strategies. These elements are key to evoke emotions, appeal to the audience's interest and enhance their learning by making complex scientific information more accessible and relatable. In addition, data visualisation, art, and multimedia representations are important components in making scientific information more accessible and understandable to non-specialist audiences.This poster illustrates several examples of communication products and activities that have proven to be effective and innovative in explaining complex issues related to climate and air quality services, among them: The use of scrolling narratives and augmented reality as communication and education tools is an effective way to raise awareness and allow users to interact with the basic concepts of air quality. These innovative approaches not only inform users about the intricate dynamics of air quality, but also demonstrate how such knowledge can inform regulatory and planning decisions. Through immersive experiences, individuals gain insight into the complexity of environmental issues and are empowered to take informed action towards sustainable solutions. A scientific digital comic that introduces the science of climate services co-production to a broad audience and the importance of services for climate change adaptation and mitigation. A theatre play developed as a result of a co-production process between artists and scientists, which contributes to increase the scientific literacy of the general public on how climate science is done and what are their opportunities and limitations.A roadshow to showcase a digital art project in 5 different locations in Southeast Europe, exploring the potential of data, research and climate services. This innovative approach aims to broaden the community of climate service users in this currently under-represented European region, whereas increasing the resilience of the area through the promotion of climate services.
- Research Article
57
- 10.1016/s1876-3804(21)60016-2
- Feb 1, 2021
- Petroleum Exploration and Development
Production performance forecasting method based on multivariate time series and vector autoregressive machine learning model for waterflooding reservoirs
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.