Abstract
In this paper, we propose a method to ingest big spatiotemporal data using a parallel technique in a cluster environment. The proposed method includes an indexing method for effective retrieval in addition to the parallel ingestion method of spatiotemporal data. In this paper, a dynamic multilevel grid index scheme is proposed to maximize parallelism and to adapt to the skewed spatiotemporal data. Finally, through experiments in a cluster environment, it is shown that the ingestion and query throughput increase as the number of nodes increases.
Highlights
A large amount of spatiotemporal data has been generated, and the applications of spatiotemporal data has been increasing
We propose a real-time parallel ingestion method for big spatiotemporal data by using Apache Accumulo like Geomesa
We identify the cells that need to be accessed during query processing and perform query processing on the tablets corresponding to each cell in parallel
Summary
A large amount of spatiotemporal data has been generated, and the applications of spatiotemporal data has been increasing. References [7,8,9] proposed Apache Spark-based big spatiotemporal data processing methods. Reference [5] proposes an indexing method for moving objects based on Apache Spark to manage the index and to store location data on distributed in-memory. Accumulo, which uses memory table, cache, and write ahead logs to alleviate IO overhead It exploits the data distribution feature of Apache Accumulo for indexing spatiotemporal data with space-filling curve techniques. We propose a real-time parallel ingestion method for big spatiotemporal data by using Apache Accumulo like Geomesa. The proposed parallel query processing and ingestion methods are based on the table split feature of Apache Accumulo and use in-memory storage.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have