Abstract

A methodology for representation of spatial data processing pipelines using relational database within the framework of the computing backend of the online information-analytical system “Climate” (http://climate.scert.ru) is proposed. Each pipeline is represented by a sequence of instructions for the computing backend describing how to run data processing modules and pass datasets between them (from the output of one module to the input of another one), including raw data and final computational results obtained in graphical or binary formats. Using relational database for storing descriptions of processing pipelines used in the “Climate” system provides flexibility and efficiency while adding and developing spatial data processing modules. It also provides computing pipelines scaling for further implementation for multiprocessor systems.

Highlights

  • Data analysis process represents a set of sequential operations starting from data search and retrieval and ending with the output of results in the required format

  • This paper describes a methodology for representing computing pipelines as modified labeled oriented multigraphs with their subsequent translation to relational database

  • To represent a computing pipeline within the scope of the relational database such as MySQL, initially it is convenient to display it as a graph reflecting the workflow [4]

Read more

Summary

Introduction

Data analysis process represents a set of sequential operations starting from data search and retrieval and ending with the output of results in the required format. In most studies such a procedure is either completely or partially a manual process when each such operation is performed by a researcher independently using various software products, starting from the very beginning every time To automate this process specialized software products aimed at eliminating the need for regular routine actions and thereby speeding up the research might be used. An urgent task is to formalize the representation of computing (data processing) pipelines in a convenient and standardized form, that makes it possible to facilitate and force the process of their formation, modification, and reuse This will contribute to the implementation of the current FAIR principles used for the management of scientific data and results (https://www.go-fair.org/fair-principles/), within the framework of any information-. The methodology is quite universal and might be adapted for other information and analytical systems

General Approach
Data Processing Pipeline Graph Representation
Representation of the Computing Pipeline Graph in the Relational Database
Building the Computing Pipeline Based on the Relational Database
Climatic Index Example
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call