Abstract

Distributed data stream management systems (DDSMS) are often used to analyze and process road network data streams. DDSMS are composed of upper layer relational query systems (RQS) and lower layer stream processing systems (SPS). DDSMS usually need to meet multiple query requests. This often converts RQS submitted by users into different quest tasks on SPS, and executes query plans on different nodes in parallel by partitioning stream according to the values of specific attributes or partitioning keys. However, executing multiple query plans can cause redundant and repetitive partitioning. This article presents the framework of data stream partitioning based on runtime correlation discovery. It combines the runtime positive-correlation partitioning (RPC-partitioning) and the clustering partitioning (Clu-partitioning). In the process of RPC-partitioning, we first use batching schemes to reduce the number of output buffers and then partition data streams using the correlation between different partitioning keys. In the process of Clu-partitioning, we re-partition data streams by clustering for skewed data. Experiments show that our method can reduce the network communication cost from 16% to 20% with two workloads of road network data streams and improve the throughput in DDSMS. It proves the effectiveness of our method, especially on reducing the operational cost in the cloud environment.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call