Abstract

Many data stream sources are prone to dramatic spikes in volume, and data items arrive in a bursting fashion. Peak load during a spike can be orders of magnitude higher than typical load, and processing all the arrived data items will exceed memory availability. It becomes necessary to shed load by dropping some fraction of the unprocessed data items during a spike. We consider the problem of load shedding for continuous sliding window join-aggregation queries over data streams when the available system memory may be insufficient to keep the entire query state and model load shedding as insertion of drop operators into query plan. Then a new semantic load shedding strategy is presented. The key idea of the load shedding strategy is to partition the domain of the join attribute into certain sub-domains, and filter out certain input tuples based on their join values by maintaining simple data stream statistics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call