Abstract

Scatterplots are essential tools for data exploration. However, this tool poorly scales with data-size, with overplotting and excessive delay being the main problems. Generalization methods in the attribute domain focus on visual manipulations, but do not take into account the inherent nature of information redundancy in most geographic data. These methods may also result in alterations of statistical properties of data. Recent developments in spatial statistics, particularly the formulation of effective sample size and the fast approximation of the eigenvalues of a spatial weights matrix, make it possible to assess the information content of a georeferenced data-set, which can serve as the basis for resampling such data. Experiments with both simulated data and actual remotely sensed data show that an equivalent scatterplot consisting of point clouds and fitted lines can be produced from a small subset extracted from a parent georeferenced data-set through spatial resampling. The spatially simplified data subset also maintains key statistical properties as well as the geographic coverage of the original data.

Highlights

  • A scatterplot is a statistical graph that uses a point symbol to depict a corresponding value pair in the Cartesian plane

  • Our objective is to find a solution to the problems of overplotting and excessive delays in generating scatterplots for large spatial data-sets, for remotely sensed data and their derivatives

  • Effective sample size is very sensitive to the SAR coefficient – at a lower level of spatial autocorrelation, variable Y has far fewer redundant data points than X

Read more

Summary

Introduction

A scatterplot is a statistical graph that uses a point symbol to depict a corresponding value pair in the Cartesian plane. With two graphic layers, a point cloud, and global/local fitted lines, a scatterplot is able to reveal various aspects of the relationship between two variables, based upon which researchers can specify more realistic statistical models. While this graphic tool works well for small size data-sets, it loses its effectiveness when the number of data points becomes so large that points overlap, forming an indiscernible clump (Figure 1). Because each point symbol needs to be plotted on a graphic device, the time to render a scatterplot can be so long that interactive exploration becomes impossible. The problems of overplotting and excessive delay prevent a scatterplot from being used in exploring large size data-sets, a common task in spatial data analysis and modeling. Cressie, Olsen, and Cook (1997) noted that statistical tools initially designed for small data-sets may be challenged by and fail with massive data-sets

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.