Scaling Properties of Common Statistical Operators for Gridded Datasets

Charles S Zender,Harry Mangalam

doi:10.1177/1094342007083802

Abstract

An accurate cost model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g. dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.

Highlights

Scientific advances in geosciences increasingly depend on large scale computing (e.g. NRC 2001; NSF 2003)
The solutions to these problems include seamless or virtual data grids (e.g. Foster et al 2002; Cornillon, Gallagher and Sgouros 2003) and middleware which optimizes the distribution of data analysis across the available computing resources (e.g. Woolf, Haines and Liu 2003; Chen and Agrawal 2004)
We are interested in data analysis optimization for geoscience datasets stored on rectangular grids rather than, for example, polygonal meshes common in GIS applications

Summary

Introduction

Scientific advances in geosciences increasingly depend on large scale computing (e.g. NRC 2001; NSF 2003). Analysis and post-processing of the resulting tera-scale geoscience datasets presents its own set of problems. The solutions to these problems include seamless or virtual data grids Foster et al 2002; Cornillon, Gallagher and Sgouros 2003) and middleware which optimizes the distribution of data analysis across the available computing resources We are interested in data analysis optimization for geoscience datasets stored on rectangular grids rather than, for example, polygonal meshes common in GIS applications. Rectangular datasets are well suited to parallel analysis because their mutually independent coordinates facilitate decomposition into smaller datasets of finer granularity, e.g. chunking (Li et al 2003; Drake, Jones and Carr 2005)

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The International Journal of High Performance Computing Applications	Publication Date: Nov 1, 2007
Citations: 20	License type: cc-by

R Discovery Prime

R Discovery Prime

Scaling Properties of Common Statistical Operators for Gridded Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications

Lead the way for us

Similar Papers

Estimating Power/Energy Consumption in Database Servers
Manuel Rodriguez-Martinez ... Melvin Greer
Procedia Computer Science | VOL. 6
Manuel Rodriguez-Martinez, et. al.Manuel Rodriguez-Martinez ... Melvin Greer
01 Jan 2010
Procedia Computer Science | VOL. 6

Improving cost modelling of residential property replacement costs for short-term insurance purposes: A South African Perspective
Inge Pieterse
-
Inge PieterseInge Pieterse
01 Jan 2023
01 Jan 2023

An off‐line system for data acquisition and analysis
Raymond C Master
Journal of the American Oil Chemists' Society | VOL. 48
Raymond C MasterRaymond C Master
01 May 1971
Journal of the American Oil Chemists' Society | VOL. 48

Apathy in Dementia: An Examination of the Psychometric Properties of the Apathy Evaluation Scale
D E Clarke ... R V Reekum
Journal of Neuropsychiatry | VOL. 19
D E Clarke, et. al.D E Clarke ... R V Reekum
01 Feb 2007
Journal of Neuropsychiatry | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scaling Properties of Common Statistical Operators for Gridded Datasets

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The International Journal of High Performance Computing Applications