SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

J Boehm,K Liu,C Alis

doi:10.5194/isprs-archives-xli-b2-343-2016

Abstract

Abstract. In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.

Highlights

While the origins of the term “Big Data” itself might be complex (Diebold, 2012) and disputed, one of the most commonly accepted definitions of the term was given by Laney (2001)
For this work we focus on the first step in the Big Data pipeline, the import of data into the framework, commonly referred to as data ingestion
Apache Spark heavily relies on the concept of Resilient Distributed Datasets (RDDs) (Zaharia et al, 2012)

Summary

INTRODUCTION

While the origins of the term “Big Data” itself might be complex (Diebold, 2012) and disputed, one of the most commonly accepted definitions of the term was given by Laney (2001). He observes “data management challenges along three dimensions: volume, velocity and variety”. The phenomenon is not unknown to the geospatial community and big spatial data has been identified as an emerging research trend (Eldawy and Mokbel, 2015a). We will focus on a special area of big spatial data and a particular challenge in data management.

Point Cloud Data Use

Geo Data as Big Data

Cloud Compute Engines

IQmulus Architecture

Spark SQL IQmulus Library

Single CPU libraries

PROPOSED METHOD

Naïve Sideloading

Slicing

Data Sets

Medium Dataset

Large Dataset

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences	Publication Date: Jun 7, 2016
Citations: 10	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Lead the way for us

Similar Papers

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE
J Boehm ... C Alis
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B2
J Boehm, et. al.J Boehm ... C Alis
07 Jun 2016
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLI-B2

Big data processing frameworks and architectures: a survey
Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
-
Raghavendra Kumar Chunduri, et. al.Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
07 Jul 2021
07 Jul 2021

Model of Point Cloud Data Management System in Big Data Paradigm
Vladimir Pajić ... Miro Govedarica
ISPRS international journal of geo-information | VOL. 7
Vladimir Pajić, et. al.Vladimir Pajić ... Miro Govedarica
09 Jul 2018
ISPRS international journal of geo-information | VOL. 7

A novel method for parallel indexing of real time geospatial big data generated by IoT devices
Suresh V Limkar ... Rakesh Kumar Jha
Future generations computer systems : FGCS | VOL. 97
Suresh V Limkar, et. al.Suresh V Limkar ... Rakesh Kumar Jha
16 Oct 2018
Future generations computer systems : FGCS | VOL. 97

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences