Abstract

This article presents a procedure for compressing massive geophysical datasets. A dataset is stratified geographically, and a penalized clustering algorithm applied to each stratum independently. The algorithm, called Monte Carlo extended ECVQ, is based on the entropy-constrained vector quantizer algorithm (ECVQ). ECVQ trades off error induced by compression against data reduction to produce a set of representative points, each of which stands for some number of input observations. Since the data are massive, a preliminary set of representatives is determined from a stratum sample, then the full stratum is clustered by assigning each observation to the nearest representative. After replacing the initial representatives by means of these clusters, the new representatives and their associated counts are a compressed version, or summary, of the original stratum data. With the initial set of representatives determined from a sample, the final summary is subject to sampling variation. A statistical model for the relationship between compressed and uncompressed data provides a framework for assessing this variability. Test data from the International Satellite Cloud Climatology Project are used to demonstrate the procedure.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.