Abstract

The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel algorithms focus on modeling single distributions from many input samples efficiently, but these may not fit the large-scale scientific data processing scenario because they cannot utilize computing resources effectively. Histograms and the Gaussian Mixture Model (GMM) are the most popular distribution representations used to model scientific datasets. Therefore, we propose the use of multi-set histogram and GMM modeling algorithms for the scenario of large-scale scientific data processing. Our algorithms are developed by data-parallel primitives to achieve portability across different hardware architectures. We evaluate the performance of the proposed algorithms in detail and demonstrate use cases for scientific data processing.

Highlights

  • The domain structure describes the topological structure of a scientific dataset, which specifies the relationship among locations for storing data values

  • This paper presents parallel algorithms of multi-variant histogram and Gaussian Mixture Model (GMM) modeling

  • The algorithms can efficiently model histograms and GMMs from samples that are divided into multiple sets

Read more

Summary

Introduction

The classic data analysis and visualization workflow needs to write the raw data produced by the simulation into the hard disk first and conduct subsequent analysis. This workflow will suffer from storage space limitations and I/O bottlenecks if the dataset size is huge. To store the data from the continuous spatial space in a file, we usually decouple the data into attributes and the domain structure. The attributes are the data values obtained from simulation or observation at grid points.

Objectives
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.