Dual-PISA: An index for aggregation operations on time series data

Jialin Qiao,Xiangdong Huang,Jianmin Wang,Raymond K Wong

doi:10.1016/j.is.2019.101427

Abstract

Aggregation operations play an essential role in time series database management. As the number of data increases, it is difficult for current solutions, such as summary table and MapReduce-based methods to respond to such queries with low latency. Other approaches, such as segment tree-based methods, have a poor insertion performance when the data size exceeds the available memory. This paper proposes a Persistent Index for Segmented Aggregations (PISA), which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantage of insertion in traditional segment trees. By defining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding unnecessary reading data on disk. Additionally, we extend it to Dual-PISA to tolerate a range of unordered data, which is very important in the real world. Dual-PISA is stored on disk and is hugely memory-efficient — only takes a few hundred bytes of memory for billions of data points. Dual-PISA can be easily implemented on both traditional databases and NoSQL systems. It handles aggregation queries within milliseconds on a commodity server, for a time range that contains tens of billions of data points.

Full Text