Abstract

Several modern day problems need to deal with large amounts of spatio-temporal data. As such, in order to meet the application requirements, more and more systems are adapting to the specificities of those data. The most prominent case is perhaps the data storage systems, that have developed a large number of functionalities to efficiently support spatio-temporal data operations. This work is motivated by the question of which of those data storage systems is better suited to address the needs of industrial applications. In particular, the work conducted, set to identify the most efficient data store system in terms of response times, comparing two of the most representative of the two categories (NoSQL and relational), i.e. MongoDB and PostgreSQL. The evaluation is based upon real, business scenarios and their subsequent queries as well as their underlying infrastructures and concludes in confirming the superiority of PostgreSQL in almost all cases with the exception of the polygon intersection queries. Furthermore, the average response time is radically reduced with the use of indexes, especially in the case of MongoDB.

Highlights

  • The volumes of spatial data that modern-day systems are generating has met staggering growth during the last few years

  • Numerous business applications are emerging by processing the 285 billion points regarding aircraft movements per year gathered from the Automatic Dependent Surveillance Broadcast (ADS-B) system [3] and the 60Mb of Automatic Identification System (AIS) and weather data collected every second by MarineTraffic’s on-line monitoring service [4] or the 4 millions geotagged tweets daily produced at Twitter [5]

  • The average response time is smaller in case of MongoDB and in some cases reduced at half comparing to PostgreSQL

Read more

Summary

Introduction

The volumes of spatial data that modern-day systems are generating has met staggering growth during the last few years Managing and analyzing these data is becoming increasingly important, enabling novel applications that may transform science and society. Distributed database systems have been proven instrumental in the effort to dealing with this data deluge These systems are distinguished by two key-characteristics: a) system scalability: the underlying database system must be able to manage and store a huge amount of spatial data and to allow applications to efficiently retrieve it; and, b) interactive performance: very fast response times to client requests. The performance is measured using a set of spatio-temporal queries that mimic real case scenarios that performed in a dataset provided by MarineTraffic1 The document is structured as follows: Section 2 provides details about the related work in spatio-temporal systems and benchmark analysis; Section 4 describes the technology overview; Section 4 describes the evaluation of spatio-temporal database systems used; Section 5 presents the experimental results while Section 6 presents the final conclusions of this study and future work

Benchmarks for spatio-temporal database evaluation
Distributed systems and technologies for spatial data processing
Technology overview
Dataset overview
Use case - queries
System architecture
Data ingestion
Cluster setup - AWS
Experimental evaluation
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call