Correlation-aware probabilistic data summarization for large-scale multi-block scientific data visualization

Yang Yang,Yunhai Wang,Kecheng Lu,Yi Cao,Yu Wu

doi:10.1007/s41095-022-0304-6

Abstract

In this paper, we propose a correlation-aware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations. The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information, spatial location, and correlation distribution using Bayes’ rule. This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them, thus significantly reducing the computational cost. Furthermore, this enables reconstruction of the original data more accurately than existing methods. We demonstrate the effectiveness of our technique using six datasets, with the largest having one billion grid points. The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.

Full Text