Abstract

AbstractToday's internet applications generate massive temporal data anywhere and anytime. Although some disk‐based temporal systems are currently available, they suffer from poor I/O performance, especially when applied in intelligent applications deployed in the cloud and edge environments. Therefore, how to process temporal operations with low latency and high throughput becomes a crucial problem for efficient data processing. This paper proposes Timo, a distributed in‐memory temporal query and analytic model for big temporal data. Firstly, a space‐efficient temporal index is proposed to support more efficiently query performance with less memory space than the state‐of‐art methods. Secondly, based on the temporal locality feature of temporal queries, Timo proposes a partitioner mechanism, which utilizes forward scan algorithm, to improve query throughput. Thirdly, some optimal strategies are proposed to improve the execution process of temporal query and analysis, which reduce intermediate result data size and improve the throughput much further. Lastly, we implement the Timo system on the Apache Spark platform, which extends Spark dataset API with temporal query and analysis API functions for users. Extensive experimental results show that the Timo system outperforms other Spark‐based temporal systems in terms of both query latency and throughput.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call