Abstract

Scientific data, generated by computational models or from experiments, are typically results of nonlinear interactions among several latent processes. Such datasets are typically high-dimensional and exhibit strong temporal correlations. Better understanding of the underlying processes requires mapping such data to a low-dimensional manifold where the dynamics of the latent processes are evident. While nonlinear spectral dimensionality reduction methods, e.g., Isomap, and their scalable variants, are conceptually fit candidates for obtaining such a mapping, the presence of the strong temporal correlation in the data can significantly impact these methods. In this paper, we first show why such methods fail when dealing with dynamic process data. A novel method, Entropy-Isomap, is proposed to handle this shortcoming. We demonstrate the effectiveness of the proposed method in the context of understanding the fabrication process of organic materials. The resulting low-dimensional representation correctly characterizes the process control variables and allows for informative visualization of the material morphology evolution.

Highlights

  • Scientific data, either produced by complex numerical simulations or collected by high-resolution scientific instruments, are typically characterized by three salient features: (i) massive data volumes;(ii) high dimensionality; and (iii) the presence of strong temporal correlation in the data

  • The term process data means any data that represent the evolution of some process states over time. Most of these high-dimensional datasets are generated through an interplay of a few physical processes. Such interactions are typically nonlinear, which means that linear dimensionality reduction methods, such as principal component analysis (PCA), are not applicable here

  • We provide a gentle introduction to the spectral dimensionality reduction methods and the process data encountered in our target application, and discuss some related techniques discussed in the literature for these topics

Read more

Summary

Introduction

(ii) high dimensionality; and (iii) the presence of strong temporal correlation in the data Representation of this big process data in a low-dimensional (2D or 3D) space can reveal key insights into the dynamics of the underlying scientific processes at play. The term process data means any data that represent the evolution of some process states over time (see Figure 1) Most of these high-dimensional datasets are generated through an interplay of a few physical processes. Such interactions are typically nonlinear, which means that linear dimensionality reduction methods, such as principal component analysis (PCA), are not applicable here. Spectral dimensionality reduction (SDR) refers to a family of methods that map high-dimensional data to a low-dimensional representation by learning the low-dimensional structure in the original data. The feature matrix, F, captures the structure of the data through some selected property (e.g., pairwise distances)

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call