Abstract

Abstract. In this paper we benchmark a previously introduced big data platform that enables the analysis of big data from remote sensing and other geospatial-temporal data. The platform, called IBM PAIRS Geoscope, has been developed by leveraging open source big data technologies (Hadoop/HBase) that are in principle scalable in storage and compute to hundreds of PetaBytes. Currently, PAIRS hosts multiple PetaBytes of curated and geospatial-temporally indexed data. It organizes all data with key-value combinations, performing analytics close to the data to minimize data movement.

Highlights

  • Remote sensing data useful for government, industry, and research is accumulating at an ever-increasing rate with satellite imagery alone exceeding several hundred TeraBytes per day 1

  • Data archives are increasing even faster, because most applications require multiple data sets for context and comparison, especially for machine-learning models where historical data sets are required for training and validation. This rapid growth leads to the notion of data gravity, which results from two observations: first, remote sensing data is becoming too big to move so it stays where it is, and second, the value of data is increased by proximity to other data, so it attracts more

  • To facilitate the improved discovery of information in remote sensing data, IBM has built a cloud-based platform that continuously ingests and curates more than 10 TeraBytes per day and presents it to multiple users for example with a web-based interface for search, analysis and viewing. This platform, called PAIRS Geoscope, has been developed using open-source big data technologies that can be scalable to hundreds of PetaBytes [1]

Read more

Summary

INTRODUCTION

Remote sensing data useful for government, industry, and research is accumulating at an ever-increasing rate with satellite imagery alone exceeding several hundred TeraBytes per day 1. Data archives are increasing even faster, because most applications require multiple data sets for context and comparison, especially for machine-learning models where historical data sets are required for training and validation This rapid growth leads to the notion of data gravity, which results from two observations: first, remote sensing data is becoming too big to move so it stays where it is, and second, the value of data is increased by proximity to other data, so it attracts more. In this sense, data gravity means that future platforms must heavily leverage public or private cloud computing to reach scalability, and they need to move the analytics and computation to the data rather than the traditional way, where the data are moved to the applications. In the system described here, such a task can be simplified with minimum data movement and file opening

IBM PAIRS GEOSCOPE
OVERVIEW LAYERS TO ACCELERATE ANALYTICS
PERFORMANCE BENCHMARK OF PAIRS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call