Abstract

Abstract. Ripley’s K functions are powerful tools for studying the spatial arrangement or spatiotemporal distribution characteristics of geographic phenomena and events in spatial analysis and has been used in many fields. However, the K functions are compute-intensive for point-wise distance comparisons, edge correction and simulations for significance test. Although parallel computing technologies have been adopted to accelerate K functions, previous works haven’t extended the optimization from space to space-time dimension. This study presents an acceleration method for K functions upon state-of-the-art distributed computing framework Apache Spark, and four optimization strategies are leveraged to simplify calculation procedures and accelerate distributed computing respectively, including 1) spatiotemporal indexing based on R-tree with Sort-Tile-Recursive (STR) algorithm for reducing distance comparison when retrieving potential spatiotemporally neighbouring points; 2) Hash-Table-based caching for spatiotemporal edge correction weights reuse and reducing repetitive computation; 3) Spatiotemporal partitioning using KDB-tree as well as cylinder intersection redundancy strategy for decreasing ghost buffer redundancy in partitions and supporting near-balanced distributed processing; 4) Customized serialization of spatiotemporal objects and indexes for lowering the overhead of data transmission. Experiments verify the effectiveness and time efficiency of the proposed optimization strategies, and also evaluate the overall performance and scalability. Based on the proposed methods, a web-based visual analytics framework has been developed and publicly shared through GitHub, and four types of the distributed K functions are implemented, including space, space-time, local and cross K functions, which demonstrates its value on promoting geographical and socioeconomic studies.

Highlights

  • Effective approaches for detecting and studying the spatial arrangement or spatiotemporal distribution characteristics of geographic points would be helpful to investigate and interpret the spatiotemporal point process hidden behind geographic phenomenon or events (Cui et al, 2017)

  • Its parameters can be derived from study area, not like the bandwidth in kernel density estimation (KDE) that usually relies on experience (Yuan et al, 2019)

  • When the driver program is submitted to the master, computing tasks will be generated and computing resources for the job will be allocated according to the submitted K function and Apache Spark parameters respectively

Read more

Summary

INTRODUCTION

Effective approaches for detecting and studying the spatial arrangement or spatiotemporal distribution characteristics of geographic points would be helpful to investigate and interpret the spatiotemporal point process hidden behind geographic phenomenon or events (Cui et al, 2017). The time efficiency of the classical desktop-based packages is far from satisfying for large data volume. It affects the user experience of geoprocessing significantly (Hu et al, 2019) and impedes further application. Made to accelerate the computing process of K functions, existing studies mainly focus on spatial dimension, and temporal dimension is seldom involved. These implementations are limited in scalability and workflow optimization due to relative expensive programming cost of the parallel frameworks.

METHODOLOGY
R-tree-based Spatiotemporal Indexing
Spatiotemporal Weight Caching
KDB-tree-based Spatiotemporal Partitioning
Customized Serialization for Data Transmission
EXPERIMENTS
Performance of Spatiotemporal Indexing
Performance of Spatiotemporal Partitioning
Performance of Integrated Four Optimization Strategies
Performance of Customized Serialization
Overall Speedup Analysis
Findings
WEB-BASED VISUAL ANALYTICS FRAMEWORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call