Abstract

In this work, we applied a stochastic simulation methodology to quantify the power of the detection of outlying mixture components of a stochastic model, when applying a reduced-dimension clustering technique such as Self-Organizing Maps (SOMs). The essential feature of SOMs, besides dimensional reduction into a discrete map, is the conservation of topology. In SOMs, two forms of learning are applied: competitive, by sequential allocation of sample observations to a winning node in the map, and cooperative, by the update of the weights of the winning node and its neighbors. By means of cooperative learning, the conservation of topology from the original data space to the reduced (typically 2D) map is achieved. Here, we compared the performance of one- and two-layer SOMs in the outlier representation task. The same stratified sampling was applied for both the one-layer and two-layer SOMs; although, stratification would only be relevant for the two-layer setting—to estimate the outlying mixture component detection power. Two distance measures between points in the map were defined to quantify the conservation of topology. The results of the experiment showed that the two-layer setting was more efficient in outlier detection while maintaining the basic properties of the SOM, which included adequately representing distances from the outlier component to the remaining ones.

Highlights

  • The purpose of this paper was to apply stochastic simulation for a better understanding of the possibilities of outlier component detection in a Gaussian mixture using oneand two-layer Self Organizing Maps (SOMs)

  • Given that in many real problems, it is interesting to keep a representation of the outlier in the map while respecting the essence of the standard SOM, we studied how the SOM is able to do so in some situations as simulated in our experiment and checked if our sense that two-layer SOMs would do better in such outlier representations and detection than onelayer ones, as well as an adequate “between outlier component and remaining component distance” representation, was correct

  • The SOM is a neural network that allows us to project a high-dimensional vector space onto a low-dimensional topology integrated by a set of different nodes or neurons displayed as a grid

Read more

Summary

Introduction

The purpose of this paper was to apply stochastic simulation for a better understanding of the possibilities of outlier component detection in a Gaussian mixture using oneand two-layer Self Organizing Maps (SOMs). Since the stratum including the outlier will have several nodes associated with it in its first layer map, one of these nodes may adequately represent the outlying component If so, it may receive a node of its own in the second (final) map. Note that when comparing two SOMs in the preservation of topology, the distance in the original space is the same for both, so one just has to compare the distances in the SOM Should it just depend on the integer value map coordinates or on the corresponding winning node weights (centroids)?

The SOM Algorithm
Two-Layer SOM
The Computational Experiment
The Stochastic Gaussian Mixture Model for the Simulations
The Sampling Procedure and Strata Configuration
Structure and Parameters of One- and Two-Layer SOMs
SOM Node Initialization
Conservation of Topology
Image-Based Distance
Graph-Based Distance
Toy Example
Image Distance
Graph Distance
Results
One-Layer SOM
Second-Layer Results
Interpretation of the Results and Concluding Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call