A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performing models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights by elucidating model attributions of their decision, many limitations still exist—They are primarily instance-based and not scalable across the dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD , a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation through user studies with two models and three time series datasets demonstrates the effectiveness of HILAD, which fosters a deeper model understanding, immediate corrective actions, and model reliability enhancement.

Similar Papers
  • Research Article
  • 10.5075/epfl-thesis-4688
Robust Multivariate and Nonlinear Time Series Models
  • Jan 1, 2010
  • Ravi Ramakrishnan

Time series modeling and analysis is central to most financial and econometric data modeling. With increased globalization in trade, commerce and finance, national variables like gross domestic productivity (GDP) and unemployment rate, market variables like indices and stock prices and global variables like commodity prices are more tightly coupled than ever before. This translates to the use of multivariate or vector time series models and algorithms in analyzing and understanding the relationships that these variables share with each other. Autocorrelation is one of the fundamental aspects of time series modeling. However, traditional linear models, that arise from a strong observed autocorrelation in many financial and econometric time series data, are at times unable to capture the rather nonlinear relationship that characterizes many time series data. This necessitates the study of nonlinear models in analyzing such time series. The class of bilinear models is one of the simplest nonlinear models. These models are able to capture temporary erratic fluctuations that are common in many financial returns series and thus, are of tremendous interest in financial time series analysis. Another aspect of time series analysis is homoscedasticity versus heteroscedasticity. Many time series data, even after differencing, exhibit heteroscedasticity. Thus, it becomes important to incorporate this feature in the associated models. The class of conditional heteroscedastic autoregressive (ARCH) models and its variants form the primary backbone of conditional heteroscedastic time series models. Robustness is a highly underrated feature of most time series applications and models that are presently in use in the industry. With an ever increasing amount of information available for modeling, it is not uncommon for the data to have some aberrations within itself in terms of level shifts and the occasional large fluctuations. Conventional methods like the maximum likelihood and least squares are well known to be highly sensitive to such contaminations. Hence, it becomes important to use robust methods, especially in this age with high amounts of computing power readily available, to take into account such aberrations. While robustness and time series modeling have been vastly researched individually in the past, application of robust methods to estimate time series models is still quite open. The central goal of this thesis is the study of robust parameter estimation of some simple vector and nonlinear time series models. More precisely, we will briefly study some prominent linear and nonlinear models in the time series literature and apply the robust S-estimator in estimating parameters of some simple models like the vector autoregressive (VAR) model, the (0, 0, 1, 1) bilinear model and a simple conditional heteroscedastic bilinear model. In each case, we will look at the important aspect of stationarity of the model and analyze the asymptotic behavior of the S-estimator.

  • Research Article
  • Cite Count Icon 152
  • 10.1093/ije/dyl162
Methods for monitoring influenza surveillance data
  • Aug 22, 2006
  • International Journal of Epidemiology
  • Benjamin J Cowling + 4 more

A variety of Serfling-type statistical algorithms requiring long series of historical data, exclusively from temperate climate zones, have been proposed for automated monitoring of influenza sentinel surveillance data. We evaluated three alternative statistical approaches where alert thresholds are based on recent data in both temperate and subtropical regions. We compared time series, regression, and cumulative sum (CUSUM) models on empirical data from Hong Kong and the US using a composite index (range = 0-1) consisting of the key outcomes of sensitivity, specificity, and time to detection (lag). The index was calculated based on alarms generated within the first 2 or 4 weeks of the peak season. We found that the time series model was optimal in the Hong Kong setting, while both the time series and CUSUM models worked equally well on US data. For alarms generated within the first 2 weeks (4 weeks) of the peak season in Hong Kong, the maximum values of the index were: time series 0.77 (0.86); regression 0.75 (0.82); CUSUM 0.56 (0.75). In the US data the maximum values of the index were: time series 0.81 (0.95); regression 0.81 (0.91); CUSUM 0.90 (0.94). Automated influenza surveillance methods based on short-term data, including time series and CUSUM models, can generate sensitive, specific, and timely alerts, and can offer a useful alternative to Serfling-like methods that rely on long-term, historically based thresholds.

  • Research Article
  • Cite Count Icon 14
  • 10.1111/1462-2920.16017
Time after time: detecting annual patterns in stream bacterial biofilm communities.
  • May 1, 2022
  • Environmental Microbiology
  • Anju Gautam + 2 more

SummaryTo quantify the major environmental drivers of stream bacterial population dynamics, we modelled temporal differences in stream bacterial communities to quantify community shifts, including those relating to cyclical seasonal variation and more sporadic bloom events. We applied Illumina MiSeq 16S rRNA bacterial gene sequencing of 892 stream biofilm samples, collected monthly for 36‐months from six streams. The streams were located a maximum of 118 km apart and drained three different catchment types (forest, urban and rural land uses). We identified repeatable seasonal patterns among bacterial taxa, allowing their separation into three ecological groupings, those following linear, bloom/trough and repeated, seasonal trends. Various physicochemical parameters (light, water and air temperature, pH, dissolved oxygen, nutrients) were linked to temporal community changes. Our models indicate that bloom events and seasonal episodes modify biofilm bacterial populations, suggesting that distinct microbial taxa thrive during these events including non‐cyanobacterial community members. These models could aid in determining how temporal environmental changes affect community assembly and guide the selection of appropriate statistical models to capture future community responses to environmental change.

  • Research Article
  • Cite Count Icon 39
  • 10.1016/j.ins.2012.08.028
Granular modelling of signals: A framework of Granular Computing
  • Sep 11, 2012
  • Information Sciences
  • Adam Gacek

Granular modelling of signals: A framework of Granular Computing

  • Research Article
  • 10.1111/j.2517-6161.1989.tb01436.x
Discussion of the Paper by Bruce and Martin
  • Jul 1, 1989
  • Journal of the Royal Statistical Society Series B: Statistical Methodology

Discussion of the Paper by Bruce and Martin

  • Dissertation
  • 10.6092/unibo/amsdottorato/9328
Essays in Robust and Nonlinear Time Series Models.
  • Apr 2, 2020
  • Enzo D’Innocenzo

This PhD dissertation deals with the world of multivariate time series models where the behaviour of the observed process is described by using a time-varying parameter. In particular, this thesis explore three different dynamic multivariate nonlinear models which are able to deal with multivariate time series gathered from heavy-tailed phenomena. Although the popularity of linear and univariate time series models, empirical evidences have shown that variables generated from complex phenomena are typically inter-related both contemporaneously and across time. This is the case for several fields of science such as economics, finance, biology or physics, where it is widely accepted that with a univariate approach it is difficult to obtain a satisfactory representation of the reality or to make good predictions about the future. For these reasons, the literature of linear multivariate Gaussian time series models has received increasing attention. However, these models are known for their unsatisfactory performances when the collected data are contaminated by outliers, yielding biased estimates and unreliable forecasts. In fact, when departure from the hypothesis of normality is confirmed by the observed data, it is reasonable to switch into the realm of nonlinear or non-Gaussian time series models. Unfortunately, despite the development of recent technologies, the estimation of nonlinear time series models might be really challenging, since they require simulation-based and computer-intensive methods. In addition, statistical properties of such estimators are not always easy to be derived. This thesis contributes to the literature by defining dynamic multivariate and heavy-tailed models that are relatively simple. The emphasis is models which are analytically tractable and can be easily estimated by means of maximum likelihood. For each of the models, a very detailed statistical and asymptotic analysis it is provided. Their practical usefulness is highlighted with several simulation studies and empirical applications.

  • Discussion
  • Cite Count Icon 17
  • 10.1016/j.jhydrol.2020.124614
Discussion of “Comparative assessment of time series and artificial intelligence models to estimate monthly streamflow: A local and external data analysis approach” by Saeid Mehdizadeh, Farshad Fathian, Mir Jafar Sadegh Safari and Jan F. Adamowski
  • Jan 24, 2020
  • Journal of Hydrology
  • Isa Ebtehaj + 2 more

Discussion of “Comparative assessment of time series and artificial intelligence models to estimate monthly streamflow: A local and external data analysis approach” by Saeid Mehdizadeh, Farshad Fathian, Mir Jafar Sadegh Safari and Jan F. Adamowski

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-66259-2_2
Feature Extraction in Time Domain for Stationary Data
  • Jan 1, 2021
  • Alireza Entezami

Time series is a sequence of data points that typically consists of successive measurements at a specific time interval. Time series analysis is a statistical tool that aims at analyzing such measurements for initial data analysis, model identification, parameter estimation, model validation, feature extraction, and forecasting. In the context of SHM, time series analysis is a powerful method for feature extraction, in which case some statistical properties of time series data or a time series model are extracted as the main damage-sensitive features. The selection of an appropriate model class is an important step in time series analysis. This process depends on the type and nature of time series data and availability or unavailability of input or excitation data. When vibration measurements are linear and stationary, time-invariant linear models are the most suitable choices for feature extraction. The most crucial and essential issue in time series modeling is to determine an adequate and accurate order so that it enables the model of interest to generate uncorrelated residuals. This chapter introduces the fundamental principle of time series analysis and modeling for feature extraction. Moreover, the proposed methods regarding model identification, order determination, and feature extraction are discussed in detail.

  • Research Article
  • Cite Count Icon 1
  • 10.2312/eurova.20151107
Integrating Predictions in Time Series Model Selection
  • Jan 1, 2015
  • Markus Bögl + 6 more

Time series appear in many different domains. The main goal in time series analysis is to find a model for given time series. The selection of time series models is done iteratively based, usually, on information criteria and residual plots. These sources may show only small variations and, therefore, it is necessary to consider the prediction capabilities in the model selection process. When applying the model and including the prediction in an interactive visual interface it is still difficult to compare deviations from actual values or benchmark models. Judging which model fits the time series adequately is not well supported in current methods. We propose to combine visual and analytical methods to integrate the prediction capabilities in the model selection process and assist in the decision for an adequate and parsimonious model. In our approach a visual interactive interface is used to select and adjust time series models, utilize the prediction capabilities of models, and compare the prediction of multiple models in relation to the actual values.

  • Research Article
  • Cite Count Icon 38
  • 10.1007/s00521-016-2766-x
A comparison of time series and machine learning models for inflation forecasting: empirical evidence from the USA
  • Dec 17, 2016
  • Neural Computing and Applications
  • Volkan Ülke + 2 more

This study compares time series and machine learning models for inflation forecasting. Empirical evidence from the USA between 1984 and 2014 suggests that out of sixteen conditions (four different inflation indicators and four different horizons), machine learning models provide more accurate forecasting results in seven conditions and the time series models are better in nine conditions. Moreover, multivariate models give better results in fourteen conditions, and univariate models are better only in two conditions. This study shows that machine learning model prevails against time series models for the core personal consumption expenditure (core-PCE) inflation forecasting, and the time series model (ARDL) is better for the core consumer price (core-CPI) index inflation forecasting in all horizons.

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/3357613.3357615
Forecasting complex multi-component time series within systems designed to detect anomalies in dataflows of industrial automated systems
  • Sep 12, 2019
  • A N Ragozin + 2 more

The need for detection, identification and prediction of anomalies emerged just a short time ago. This task brings multiple challenges that are mainly associated with recording and detection of novel attacks and influences. It draws attention of experts in network security and diagnostics of information and industrial systems. Detection of anomalies in dynamic dataflows largely determines the efficiency of management of computer network information security within information and industrial systems. At the same time, the technology of dynamic dataflow prediction is also very important for building systems intended to detect anomalies in data protection within industrial control systems. Any automated systems are based on available computer capabilities and advances in management theory, mathematical modeling and optimizations methods. Processes that occur in industrial automated systems and that are reflected in dataflows being observed constitute complex multi-component processes, thus making it more complicated to predict such processes. In this case, complex multi-component time series data should be predicted at different (short-term, medium-term and long-term) time scales. A neural network architecture matching the structure of the multi-component time series being predicted should be built for generating a multi-component prediction. It is proposed to decompose the complex multi-component time series into several basic components using the digital signal processing technology, i. e. to perform a preliminary structural analysis of multi-component time series within the observed range of all time series that reflect operation of the industrial control system. Separate predictions with different time horizon are formed for each basic component of the multi-component time series using the available neural network architecture and machine learning taking into account dynamic characteristics of the above components. Anomalies in the observed range of multiple time series that reflect operation of the industrial control system are detected (identified) through component-wise comparison of each component (resulted from the above preliminary digital processing) of any time series within the observed range of all time series, with each prediction of the relevant component of the above time series within the observed range of all time series. This approach that implies component-wise comparison will allow to detect anomalies within the range of observed time series of the industrial control system separately by their different dynamic characteristics, and thus will improve the efficiency of management of information security within information and industrial systems.

  • Research Article
  • Cite Count Icon 20
  • 10.2139/ssrn.808024
Copula-Based Dependence Characteriztions and Modeling for Time Series
  • Sep 21, 2005
  • SSRN Electronic Journal
  • Rustam Ibragimov

This paper develops a new unified approach to copula-based modeling and characterizations for time series and stochastic processes. We obtain complete characterizations of many time series dependence structures in terms of copulas corresponding to their finite-dimensional distributions. In particular, we focus on copula-based representations for Markov chains of arbitrary order, m-dependent and r-independent time series as well as martingales and conditionally symmetric processes. Our results provide new methods for modeling time series that have prescribed dependence structures such as, for instance, higher order Markov processes as well as non-Markovian processes that nevertheless satisfy Chapman-Kolmogorov stochastic equations. We also focus on the construction and analysis of new classes of copulas that have flexibility to combine many different dependence properties for time series. Among other results, we present a study of new classes of copulas based on expansions by linear functions (Eyraud-Farlie-Gumbel-Mongenstern copulas), power functions (power copulas) and Fourier polynomials (Fourier copulas) and introduce methods for modeling time series using these classes of dependence functions. We also focus on the study of weak convergence of empirical copula processes in the time series context and obtain new results on asymptotic gaussianity of such processes or a wide class of beta-mixing sequences.

  • Research Article
  • Cite Count Icon 2
  • 10.15276/aait.05.2022.17
Modeling and forecasting of nonlinear nonstationary processes based on the Bayesian structural time series
  • Nov 1, 2022
  • Applied Aspects of Information Technology
  • Irina A Kalinina + 1 more

The article describes an approach to modelling and forecasting non-linear non-stationary time series for various purposes using Bayesian structural time series. The concepts of non-linearity and non-stationarity, as well as methods for processing non-linearity’sand non-stationarity in the construction of forecasting models are considered. The features of the Bayesian approach in the processing of nonlinearities and nonstationaryare presented. An approach to the construction of probabilistic-statistical models based on Bayesian structural models of time series has been studied. Parametric and non-parametric methods for forecasting non-linear and non-stationary time series are considered. Parametric methods include methods: classical autoregressive models, neural networks, models of support vector machines, hidden Markov models. Non-parametric methods include methods: state-space models, functional decomposition models, Bayesian non-parametric models. One of the types of non-parametric models isBayesian structural time series. The main features of constructing structural time series are considered. Models of structural time series are presented. The process of learning the Bayesianstructural model of time series is described. Training is performed in four stages: setting the structure of the model and a priori probabilities; applying a Kalman filter to update state estimates based on observed data;application of the “spike-and-slab”method to select variables in a structural model; Bayesian averaging to combine the results to make a prediction. An algorithm for constructing a Bayesian structural time seriesmodel is presented. Various components of the BSTS model are considered andanalysed, with the help of which the structures of alternative predictive models are formed. As an example of the application of Bayesian structural time series, the problem of predicting Amazon stock prices is considered. The base dataset is amzn_share. After loading, the structure and data types were analysed, and missing values were processed. The data are characterized by irregular registration of observations, which leads to a large number of missing values and “masking” possible seasonal fluctuations. This makes the task of forecasting rather difficult. To restore gaps in the amzn_sharetime series, the linear interpolation method was used. Using a set of statistical tests (ADF, KPSS, PP), the series was tested for stationarity. The data set is divided into two parts: training and testing. The fitting of structural models of time series was performed using the Kalman filterand the Monte Carlo method according to the Markov chain scheme. To estimate and simultaneously regularize the regression coefficients, the spike-and-slab method was applied. The quality of predictive models was assessed.

  • Research Article
  • 10.5075/epfl-thesis-3012
Un modèle factoriel dynamique pour séries temporelles
  • Jan 1, 2004
  • Andrei Zenide

This work deals with factorial models for multiple time series. Its core content puts it at the interface between statistics and finance. After a brief description of the historical link between the two sciences, it reviews the literature on factorial models that are close to the model introduced in this work called The dynamical factor analysis model for time series. This model makes the hypothesis that the observed time series are influenced by a common factor, difficult to define and impossible to measure. No a priori structure is put on the factor, at each point time the value of the factor is considered as a new parameter that has to be estimated. As a consequence of this fact, the number of parameters is large and it is not possible to provide the usual asimptotic properties of the estimations by letting the number of periods tend to infi- nity. Asymptotic results in our context have been obtained by increasing the number of time series. The model makes the hypothesis that there is a linear dependence between the time series and the factor with coefficients, that are not constant over time, but rather follow a smooth random walk. This mean that the linear structure of the model is evolving slowly from a period to another. All the information which is not contained in the factor and the coefficients is considered as white noise. Using the normal distribution makes the estimation more easy and opens a toolbox of statistical methods that have been developed for this kind of data. The model is a part of the family of state space models and the Kalman filter is an essential ingredient of our estimator. The effort was concentrated on the elaboration of the structure of the model. Its complexity was constrained by the difficulty of estimation. The final shape of the model does not allow an analytical solution of the optimization problem introduced by the maximum likelihood estimation of the parameters. Numerical solutions have been found and compared with the parameters for simulated data. Some others models have been developed as simpler versions of the dynamical factor model for time series. The case where the factor can be observed has been studied and a new method for the estimation has been provided and compared with the existing methods. A second study considers a latent factor model without noise. Two methods for the estimation of the factor have been provided. The last chapter contains a detailed description of the main statistical tools used during this work. The links with the previous chapters are highlighted followed by comments.

  • Research Article
  • Cite Count Icon 122
  • 10.1016/s0165-7836(96)00482-1
Modelling and forecasting monthly fisheries catches: comparison of regression, univariate and multivariate time series methods
  • Jan 1, 1997
  • Fisheries Research
  • K.I Stergiou + 2 more

Modelling and forecasting monthly fisheries catches: comparison of regression, univariate and multivariate time series methods

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.