Abstract

The intrinsic nature of noisy and complex data sets is often concealed in low-dimensional structures embedded in a higher dimensional space. Number of methodologies have been developed to extract and represent such structures in the form of manifolds (i.e. geometric structures that locally resemble continuously deformable intervals of Rj1). Usually a-priori knowledge of the manifold's intrinsic dimensionality is required. Additionally, their performance can often be hampered by the presence of a significant high-dimensional noise aligned along the low-dimensional core manifold. In real-world applications, the data can contain several low-dimensional structures of different dimensionalities. We propose a framework for dimensionality estimation and reconstruction of multiple noisy manifolds embedded in a noisy environment. To the best of our knowledge, this work represents the first attempt at detection and modelling of a set of coexisting general noisy manifolds by uniting two aspects of multi-manifold learning: the recovery and approximation of core noiseless manifolds and the construction of their probabilistic models. The easy-to-understand hyper-parameters can be manipulated to obtain an emerging picture of the multi-manifold structure of the data. We demonstrate the workings of the framework on two synthetic data sets, presenting challenging features for state-of-the-art techniques in Multi-Manifold learning. The first data set consists of multiple sampled noisy manifolds of different intrinsic dimensionalities, such as Möbius strip, toroid and spiral arm. The second one is a topologically complex set of three interlocked toroids. Given the absence of such unified methodologies in the literature, the comparison with existing techniques is organized along the two separate aspects of our approach mentioned above, namely manifold approximation and probabilistic modelling. The framework is then applied to a complex data set containing simulated gas volume particles from a particle simulation of a dwarf galaxy interacting with its host galaxy cluster. Detailed analysis of the recovered 1D and 2D manifolds can help us to understand the nature of Star Formation in such complex systems.

Highlights

  • Dimensionality reduction and Density Estimation of raw data, are commonly used tools to extract information from complex and noisy data sets

  • Taking advantage of the probabilistic nature of the Abstract GTM (AGTM) model, we show in Fig. 24a the embedded vertices v j of the graph G for the stream model, with intensity modulated by the weighted mean of [C ii] values Ii[C ii] of particles ti in the manifold, where the weights are the posterior probabilities of the node v j, given particles ti : I

  • Semi-automated framework for denoising, dimensionality estimation, multi-manifold extraction and manifold aligned density estimation from complex data sets containing samples from noisy manifolds of diverse dimensionalities embedded in a noisy environment

Read more

Summary

Introduction

Dimensionality reduction and Density Estimation of raw data, are commonly used tools to extract information from complex and noisy data sets. We generalize the GTM model so that densities aligned along arbitrary manifolds (even non-orientable ones - such as Möbius strip) can be captured This is achieved by replacing the simple Euclidean latent space (generally parametrized as a discretized interval of R j ) with an abstract graph reflecting the topology of the data manifold that, when embedded in the data space, provides a manifold skeleton around which the noise models can be organized. This work is inspired by [31], but extends and generalizes it threefold: (1) it proposes a new robust dimensionality index estimation for data points, (2) through a dedicated manifold crawling mechanism it allows for completely abstract manifold representations in the GTM latent space (instead of a regular grid) and (3) it has Gaussian noise components naturally aligned along the manifold, unlike the spherical noise models in the original GTM and [31].

Methodology overview
Density model of a single noisy manifold
Abstract GTM
Manifold crawling
Multi manifold learning
Local dimensionality estimation
An efficient alternative to multi-manifold crawling
Experimental comparison to existing methods
Multi-manifold learning
Methods
Probabilistic modelling
Experiments on a jellyfish galaxy
A multi manifold analysis of a dwarf jellyfish galaxy
Detecting manifolds of higher dimensions
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.