Abstract

Monitoring of streamed data to detect abnormal behaviour (variously known as event detection, anomaly detection, change detection, or outlier detection) underlies many applications of the Internet of Things. There, one often collects data from a variety of sources, with asynchronous sampling, and missing data. In this setting, one can predict abnormal behavior using low-rank techniques. In particular, we assume that normal observations come from a low-rank subspace, prior to being corrupted by a uniformly distributed noise. Correspondingly, we aim to recover a representation of the subspace, and perform event detection by running point-to-subspace distance query for incoming data. In particular, we use a variant of low-rank factorisation, which considers interval uncertainty sets around "known entries", on a suitable flattening of the input data to obtain a low-rank model. On-line, we compute the distance of incoming data to the low-rank normal subspace and update the subspace to keep it consistent with the seasonal changes present. For the distance computation, we suggest to consider subsampling. We bound the one-sided error as a function of the number of coordinates employed using techniques from learning theory and computational geometry. In our experimental evaluation, we have tested the ability of the proposed algorithm to identify samples of abnormal behavior in induction-loop data from Dublin, Ireland.

Highlights

  • W HEN detailed multivariate data are available in real time, it is highly desirable to monitor the appearance of “abnormal” behavior across the multivariate data, with guarantees on the performance of the monitoring procedure, but without the computational burden of processing the data set in its entirety

  • An experimental evaluation on data from a traffic-control system in Dublin, Ireland, which shows that it is possible to process data collected from thousands of sensors over the course of one year within minutes, to answer point-tosubspace distance queries in milliseconds and detect even hard-to-detect events

  • This is the first time such guarantees have been provided for any subsampling in matrix completion

Read more

Summary

INTRODUCTION

W HEN detailed multivariate data are available in real time, it is highly desirable to monitor the appearance of “abnormal” behavior across the multivariate data, with guarantees on the performance of the monitoring procedure, but without the computational burden of processing the data set in its entirety. Many cities have been instrumented with large numbers of sensors capturing the numbers and average speeds of cars passing through the approaches of urban intersections (induction loops), volume of traffic (from CCTV data or aggregate data of mobilephone operators), and speeds of public transport vehicles (e.g., on-board satellite positioning units in buses), but many still lack the infrastructure to detect traffic accidents prior to them being reported This is to a large extent due to the limited utility of the information from each of the sensors, e.g., maintaining statistics about traffic at a particular approach of an intersection. In the case of a negative answer, a point-to-subspace distance query can estimate the extent of abnormality of an event This point-in-subspace membership test can be sub-sampled, while still allowing for guarantees on its performance. An experimental evaluation on data from a traffic-control system in Dublin, Ireland, which shows that it is possible to process data collected from thousands of sensors over the course of one year within minutes, to answer point-tosubspace distance queries in milliseconds and detect even hard-to-detect events

AN APPROACH
THE MODULOR FRAMEWORK
SUBSAMPLED POINT-TO-SUBSPACE PROXIMITY TESTER
AN ANALYSIS
EXPERIMENTAL EVALUATION
THE RESULTS
RELATED WORK
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call