Abstract

The ability to detect and process anomalies for Big Data in real-time is a difficult task. The volume and velocity of the data within many systems makes it difficult for typical algorithms to scale and retain their real-time characteristics. The pervasiveness of data combined with the problem that many existing algorithms only consider the content of the data source; e.g. a sensor reading itself without concern for its context, leaves room for potential improvement. The proposed work defines a contextual anomaly detection framework. It is composed of two distinct steps: content detection and context detection. The content detector is used to determine anomalies in real-time, while possibly, and likely, identifying false positives. The context detector is used to prune the output of the content detector, identifying those anomalies which are considered both content and contextually anomalous. The context detector utilizes the concept of profiles, which are groups of similarly grouped data points generated by a multivariate clustering algorithm. The research has been evaluated against two real-world sensor datasets provided by a local company in Brampton, Canada. Additionally, the framework has been evaluated against the open-source Dodgers dataset, available at the UCI machine learning repository, and against the R statistical toolbox.

Highlights

  • Anomalies are abnormal events or patterns that do not conform to expected events or patterns [1]

  • In running only the context detector over the entire test dataset, the results showed that there were no context anomalies that were not passed to the content detector

  • The work presented in the paper describes a novel framework for anomaly detection in Big Data

Read more

Summary

Introduction

Anomalies are abnormal events or patterns that do not conform to expected events or patterns [1]. Anomalies are generally categorized into three types: point, or content anomalies; context anomalies, and collective anomalies. Point anomalies occur for data points that are considered abnormal when viewed against the whole dataset. Context anomalies are data points that are considered abnormal when viewed against meta-information associated with the data points. Collective anomalies are data points which are considered anomalies when viewed with other data points, against the rest of the dataset. Detection algorithms can be categorized as point detection, collective detection, or context-aware detection algorithms [1]. Contextual anomalies exist where the dataset includes a combination of behavioural and contextual attributes. These terms are defined as environmental and indicator attributes, as introduced by Song et al [9].

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.