Abstract
ABSTRACT Batch paleogeographic point rotation (BPPR) is a PySpark-based extensible batch data point rotation method that accelerates rotation during paleogeographic reconstruction. Data point rotation is an important part of paleogeographic reconstruction and a significant tool for exploring the co-evolution of Earth and life. However, current point rotation techniques have challenges with processing speeds when handling extensive paleogeographic data. Therefore, this study introduced a parallel-computing framework to construct a BPPR. This method combines PySpark and PyGPlates, which can partition points and compute them simultaneously in multiple threads. The rotation of 232,277 fossil occurrences from the Cretaceous Period in the Paleobiology Database (PBDB) was completed within 26 s. By contrast, an alternative GPlates method completed the same task within 96 s. The proposed method supports CSV, EXCEL, SHP, and other data formats, thereby avoiding possible software switching requirements when using methods associated with GPlates. Using synthetic and real paleontological data as experimental datasets, BPPR proved to be nine times more efficient than GPlates when rotating 900,000 points. This efficiency improvement significantly enhanced data-driven paleogeographic analysis. The parallel strategy employed can be broadly applied to massive data analysis in geoscience.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have