Abstract

With the rapid development of mobile data acquisition technology, the volume of available spatial data is growing at an increasingly fast pace. The real-time processing of big spatial data has become a research frontier in the field of Geographic Information Systems (GIS). To cope with these highly dynamic data, we aim to reduce the time complexity of data updating by modifying the traditional spatial index. However, existing algorithms and data structures are based on single work nodes, which are incapable of handling the required high numbers and update rates of moving objects. In this paper, we present a distributed spatial index based on Apache Storm, an open-source distributed real-time computation system. Using this approach, we compare the range and K-nearest neighbor (KNN) query efficiency of four spatial indexes on a single dataset and introduce a method of performing spatial joins between two moving datasets. In particular, we build a secondary distributed index for spatial join queries based on the grid-partition index. Finally, a series of experiments are presented to explore the factors that affect the performance of the distributed index and to demonstrate the feasibility of the proposed distributed index based on Storm. As a real-world application, this approach has been integrated into an information system that provides real-time traffic decision support.

Highlights

  • Advanced technologies for sensing and computing have resulted in the creation of massive datasets consisting of trajectories of people and vehicles

  • We present a distributed spatial index based on Apache Storm, an open-source distributed real-time computation system

  • You et al proposed an approach using a distributed computing framework based on memory to address the problem of spatial join operations, because the intermediate calculation results of Hadoop must be stored on hard disk, which reduces analytical efficiency because of the excessive disk I/O cost [3]

Read more

Summary

Introduction

Advanced technologies for sensing and computing have resulted in the creation of massive datasets consisting of trajectories of people and vehicles. You et al proposed an approach using a distributed computing framework based on memory to address the problem of spatial join operations, because the intermediate calculation results of Hadoop must be stored on hard disk, which reduces analytical efficiency because of the excessive disk I/O cost [3] These cloud technologies offer the benefits of high-performance computing for GIS applications based on static SBD, a novel approach to cloud computing is needed to cope with highly dynamic spatial data. To lay the foundations for effective visual analytics of trajectory datasets, this manuscript presents a method of building a distributed spatial index on an open-source cloud computing framework, namely, Apache Storm, which has been playing an important role in traffic statistics [5] and network anomaly detection [6] Based on this technology, we conduct real-time spatial querying of spatial fast data (SFD), including range querying, KNN querying, and spatial join querying.

Related Work and Background
Spatial Querying on a Distributed Platform
Distributed Streaming Processing Framework
Storm Topology Programming Paradigm
Problem Setting
Semantic Information
Storm Topology Algorithm for a Single Dataset
Result
Spatial Joins for Moving Objects in a Storm Topology
Results
Query Partition Bolt
Distributed Spatial Index Bolt
Experimental Study
Experimental Setting and Workloads
Experimental Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call