Timeseries management systems play an important role in IoT and performance monitoring. As the data volume scales up, absorbing data memory efficiently with high throughput becomes a growing requirement for timeseries management systems. However, the designs of the existing systems, especially the in-memory data structures, suffer from two issues. First, they suffer from the trade-off between memory efficiency and performance. Second, they are not scalable because of lock contention where they cannot benefit from parallel insertion and querying. In this paper, we propose ForestTI, a scalable inverted-index-oriented timeseries management system where the balance point between memory efficiency and performance can be flexibly adjusted under the increasing memory pressure. First, we present a two-level inverted index, which is scalable with optimistic lock coupling, and its internal structure can be gradually converted to more memory efficient representations. Second, we propose a two-level pointer swizzling mechanism to actively swap out the cold posting lists and in-memory timeseries objects as the number of timeseries increases. Finally, we further optimize the on-disk data structures (i.e. write-ahead logs and LSM-tree) to adapt to the high insertion throughput from the in-memory components. We prototype ForestTI with C++ from scratch, and compared to the storage engine of Prometheus, ForestTI achieves 1.79x higher insertion throughput, 52.1% lower query latency, and 56.9% lower memory occupation. We have released the open-source code of ForestTI for public access.
Read full abstract