Multivariate Pointwise Information-Driven Data Sampling and Visualization.

Soumya Dutta,James Ahrens,Ayan Biswas

doi:10.3390/e21070699

Soumya Dutta, James Ahrens + Show 1 more

Open Access

https://doi.org/10.3390/e21070699

Copy DOI

Journal: Entropy (Basel, Switzerland)	Publication Date: Jul 16, 2019
Citations: 81	License type: CC BY 4.0

Affiliation: Los Alamos National Laboratory

Abstract

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.

Highlights

The size of the scientific data sets is increasing rapidly with ever-increasing computing capabilities.Modern-day supercomputers can generate data in the order of petabytes and soon we will enter the era of exascale computing [1,2]
We introduced the information theoretic measure pointwise mutual information (PMI) which allows quantification of statistical association for each data point which is applicable for two variables only
We presented pointwise mutual information (PMI) and a generalized extension of it which allows us to quantify the importance of each data point in terms of their statistical association considering multiple variables

Summary

Introduction

The size of the scientific data sets is increasing rapidly with ever-increasing computing capabilities.Modern-day supercomputers can generate data in the order of petabytes and soon we will enter the era of exascale computing [1,2]. The size of the scientific data sets is increasing rapidly with ever-increasing computing capabilities. As the size of the data sets keeps growing, traditional analysis and visualization techniques using full resolution raw data will soon become prohibitive since storing, parsing, and analyzing the full resolution raw data will not be a viable option anymore [3,4,5,6]. This is primarily due to the gap between the disk I/O speed and the data generation speed. Only a small subset of the data can be moved to the permanent storage for exploratory post-hoc analysis

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multivariate Pointwise Information-Driven Data Sampling and Visualization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

Center for Technology for Advanced Scientific Componet Software (TASCS)
Madhusudhan Govindaraju
-
Madhusudhan GovindarajuMadhusudhan Govindaraju
31 Oct 2010
31 Oct 2010

Relationship-aware Multivariate Sampling Strategy for Scientific Simulation Data
Subhashis Hazarika ... Earl Lawrence
-
Subhashis Hazarika, et. al.Subhashis Hazarika ... Earl Lawrence
01 Oct 2020
01 Oct 2020

Introduction to multivariate analyses
...
-
, et. al. ...
21 Mar 2002
21 Mar 2002

Supporting correlation analysis on scientific datasets in parallel and distributed settings
Yu Su ... Han-Wei Shen
-
Yu Su, et. al.Yu Su ... Han-Wei Shen
23 Jun 2014
23 Jun 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multivariate Pointwise Information-Driven Data Sampling and Visualization.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)