Abstract
With the wide applications of data streams in many fields, such as sensor network monitoring and internet traffic control, query processing over data streams has become increasingly important. In these applications, multiple aggregate queries are registered in the system, and have different sliding window sizes and different frequency upper bounds. How to share the results of these queries is a challenge. Prior work studies how to detect common tasks of these queries and share the results by computing the common tasks only once. Hybrid scheduling first addressed this problem and used the earliest-deadline-first (EDF) method. However, this work did not present a method for computing the scheduling. We formulate the scheduling problem among multiple aggregate queries with different sliding window sizes and different frequency upper bounds over data streams and propose a combination rule to classify these queries. Then, we present an efficient scheduling algorithm to decide whether a query should be executed more often than necessary, as long as the interval between two consecutive executions is less than the frequency upper bound. We also combine our scheduling algorithm with EDF to handle under loaded and overloaded situations. An experimental study shows that our scheduling algorithms are more efficient than no scheduling and EDF in terms of the number of scanned tuples, the throughput and the latency.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have