Abstract
Diffs, a concept known from source code version control systems such as git, is interesting for geospatial, event-based workflows. We investigate how the native mathematical structure of vector geometries can be utilized in order to create a diffing algorithm tailored to geospatial vector data. Diffing algorithms are a well-researched area which dates to the 1970ies; however, we find that geospatial diffing operations tends to be carried out using generic algorithms combined with a pre- and post-processing step. We created GeomDiff, an algorithm and storage format tailored to geospatial vector data. The creation time, apply/undo time, and patch size of GeomDiff was compared to three other generic algorithms by running an online experiment using 2.5 million real-world geometry pairs from OpenStreetMap. We found that the GeomDiff algorithm performs better than or on-par with the alternatives on point-geometries, and complex geometries with a small (< 500) vertex count. We argue that there are both computation time and storage space improvements to be gained by using a tailored diffing algorithm for geospatial vector data. These promising first results encourages further refinement of the algorithm in order to handle complex geometries efficiently as well.
Highlights
In computer science, a diff1 is a set of machineexecutable instructions to transform version n of source code or documentation into version n + 1 [1]
We expect the GeomDiff algorithm to exhibit faster creation, apply- and, undo-time for point, linestring and polygon geometries compared to the other algorithms
By grouping the create time results by vertex count and computing average creation time for each group (Fig. 2 and Fig. 3), we find that all algorithms except the BinaryDiff algorithm show an increase in creation time with increasing number of vertices
Summary
A diff is a set of machineexecutable instructions to transform version n of source code or documentation into version n + 1 [1]. The concept of diffs is an essential component of source code version control systems such as git [2], one of the fundamental building blocks of modern software engineering. The methods used for creating a diff and the format it is stored in will affect both creation time, storage requirements, and apply and undo time. These metrics affect the overall performance and requirements of a diff-based workflow. Using a diffing algorithm and diff storage format tailored to geospatial vector data has the possibility to provide an efficient, performant, and reliable event-based workflow for geospatial data management
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have