Spatiotemporal big data contain information in multiple dimensions such as space and time. Spatiotemporal data have the characteristics of large volume, intricate spatiotemporal relationship, and uneven spatiotemporal distribution. Index structure is one of the most important technologies used to improve system data analysis and workload. However, it is difficult to dynamically adjust with data density, resulting in increased maintenance costs and retrieval complexity. At the same time, maintaining the proximity of spatiotemporal data in spatial or temporal dimensions is crucial for efficient spatiotemporal analysis. To address these challenges, this paper proposes a learned index method, GLMI (Geohash and piecewise linear model-based index for spatiotemporal data). GLMI uses dynamic space partitioning based on the Hilbert curve to reduce the impact of data skew on index performance. In the time dimension, a piecewise linear model was constructed using the ShrinkingCone algorithm, and a buffer was designed to support the fast writing of spatiotemporal data. Compared with the current mainstream traditional high-dimensional indexes and the ZM index, GLMI has a smaller space consumption and shorter construction time compared to high-dimensional learned indexes on real traffic itinerary and trajectory record datasets. Meanwhile, GLMI also has an advantage in query efficiency.
Read full abstract