Abstract

Abstract. The article describes the possibilities and advantages of using distributed systems in the processing and analysis of remote sensing data. The preparation and processing of various types of remote sensing data (multispectral satellite images, values of climatic indicators, elevation data), which will then be used to build a simulation model of a hydroelectric power plant, was chosen as the basic task for testing the chosen approach. The existing approaches with distributed processing of spatial data of various types (vector cartographic objects, raster data, point clouds, graphs) are analyzed. The description of the developed approach is given and the rationale for the choice of its components is made. The preprocessing operations that were performed on the used raster data are described. An approach to the problems of raster data segmentation based on libraries for distributed machine learning is considered. Comparison of the speed of working with data for various algorithms of machine learning and processing is given.

Highlights

  • Geospatial and remote sensing data, due to their very large volume, variety and speed of updating, are one of the main elements of the big data concept

  • Traditional approaches use the power of computing stations to process data, but at the same time they can only scale vertically and at some point physically cannot cope with the continuous growth of the volume of processed data [7-9]. This problem is most often solved with the help of parallel and distributed processing technologies, which implement the simultaneous processing of each of the parts of the entire data set on a separate node and the combination of intermediate results into the final one [11–13]

  • The problems facing the authors of the article of predicting spread of tropical diseases, building simulation models of hydroelectric power plants, building databases of natural resource potential require the processing of large volumes of constantly updated remote sensing data on the territory of individual regions and countries in general

Read more

Summary

Introduction

Geospatial and remote sensing data, due to their very large volume, variety and speed of updating, are one of the main elements of the big data concept. Traditional approaches use the power of computing stations to process data, but at the same time they can only scale vertically (which is always costly and the capabilities are severely limited by the hardware platform) and at some point physically cannot cope with the continuous growth of the volume of processed data [7-9]. This problem is most often solved with the help of parallel and distributed processing technologies, which implement the simultaneous processing of each of the parts of the entire data set on a separate node and the combination of intermediate results into the final one [11–13]. To do this, it was necessary to analyze the existing open-source software for distributed processing of spatial data, determine their features, advantages, disadvantages, and evaluate the effectiveness of their application on certain datasets

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.