Abstract

Modern geoinformation technologies for collecting and processing data, such as laser scanning or photogrammetry, can generate point clouds with billions of points. They provide abundant information that can be used for different types of analysis. Due to its characteristics, the point cloud is often viewed as a special type of geospatial data. In order to efficiently manage such volumes of data, techniques based on a computer cluster have to be used. The Apache Spark framework has proven to be a solution for efficient processing of large volumes of data. This paper thoroughly examines the representation of point cloud data type using Apache Spark constructs. The common operations over point clouds, range queries and k-nearest neighbors queries (kNN) are implemented using Apache Spark DataFrame Application Programming Interface (API). It enabled the design of point cloud related user defined types (UDT) and user defined functions (UDF). The structure of the point cloud for efficient storing in Big Data key-value stores was analyzed and described. The methods presented in this paper were compared to PostgreSQL RDBMS, and the results were discussed.

Highlights

  • The development of satellite remote-sensing, global navigation systems, aerial photogrammetric cameras, sensor networks, radar remote sensing and laser scanning contributed to an exponential increase in the collected amount of geospatial data [1]

  • This paper presents the method for point cloud management in a distributed environment based on Apache Spark framework

  • There is increasing research on that topic every year and, software products for geospatial big data are developing at a fast pace

Read more

Summary

Introduction

The development of satellite remote-sensing, global navigation systems, aerial photogrammetric cameras, sensor networks, radar remote sensing and laser scanning contributed to an exponential increase in the collected amount of geospatial data [1]. The amount of collected data largely exceeds the possibility of its storage on individual computers and requires storing on a computer cluster. Apache Spark introduces a new platform for distributed processing that runs on one of the cluster resource managers such as Mesos or YARN. It is designed to maintain scalability and fault tolerance of MapReduce programs, and to support applications for which MapReduce was inefficient. This is achieved through the functionality of intra-memory execution

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.