Abstract

In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drawn from a stationary distribution, several complex environments deal with data streams that are subject to change over time. Taking this aspect into consideration is an important step towards building truly aware and intelligent systems. In this paper, we propose GNG-A, an adaptive method for incremental unsupervised learning from evolving data streams experiencing various types of change. The proposed method maintains a continuously updated network (graph) of neurons by extending the Growing Neural Gas algorithm with three complementary mechanisms, allowing it to closely track both gradual and sudden changes in the data distribution. First, an adaptation mechanism handles local changes where the distribution is only non-stationary in some regions of the feature space. Second, an adaptive forgetting mechanism identifies and removes neurons that become irrelevant due to the evolving nature of the stream. Finally, a probabilistic evolution mechanism creates new neurons when there is a need to represent data in new regions of the feature space. The proposed method is demonstrated for anomaly and novelty detection in non-stationary environments. Results show that the method handles different data distributions and efficiently reacts to various types of change.

Highlights

  • Usual machine learning and data mining methods learn a model by performing several passes over a static dataset

  • We propose in this paper an extension of the Growing Neural Gas (GNG) algorithm named GNG-A and we show how it is used for novelty and anomaly detection in evolving data streams

  • GNG-A is summarized in Algorithm 2, which makes a call to Algorithm 3 to check for the removal of neurons and Algorithm 4 to check for the creation of neurons

Read more

Summary

Introduction

Usual machine learning and data mining methods learn a model by performing several passes over a static dataset. We address the question of how to incrementally adapt to changes in a non-stationary distribution without requiring sensitive hyper-parameters to be manually tuned The problem is both interesting and important as evolving data streams are present in a large number of dynamic processes These methods require an expert to specify some sensitive parameters that directly affect the evolution or the forgetting rate of the neural network Setting such global parameters prior to the learning does not address the more general case where the speed of changes can vary over time, or when the distribution becomes non-stationary only in some specific regions of the feature space.

Preliminaries and related work
21: Exponentially decrease the representation error of all neurons:
Adaptation of existing neurons
Forgetting by removing irrelevant neurons
Estimating the relevance of a neuron
Adaptive removal of neurons
Dynamic creation of new neurons
Algorithm
16: Remove the neurons that become isolated
22: Remove previous neighbors of nthat become isolated
Experiments
Datasets
General properties of GNG-A
Anomaly and novelty detection
Conclusion and future work
A Details about datasets
B Details about parameters

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.