Abstract

In an asynchronous data stream, the data items may be out of order with respect to their original timestamps. This paper studies the space complexity required by a data structure to maintain such a data stream so that it can approximate the set of frequent items over a sliding time window with sufficient accuracy. Prior to our work, the best solution is given by Cormode et al. [1], who gave an O (1/ε log W log (εB/ log W) min {log W, 1/ε} log |U|)- space data structure that can approximate the frequent items within an ε error bound, where W and B are parameters of the sliding window, and U is the set of all possible item names. We gave a more space-efficient data structure that only requires O (1/ε log W log (εB/ logW) log log W) space.

Highlights

  • Identifying frequent items in a massive data stream has many applications in data mining and network monitoring, and the problem has been studied extensively [2,3,4,5]

  • Recent interest has been shifted from the statistics of the whole data stream to that of a sliding window of recent data [6,7,8,9]

  • Existing research has focused on designing space-efficient data structures to support finding the approximate frequent items

Read more

Summary

Introduction

Identifying frequent items in a massive data stream has many applications in data mining and network monitoring, and the problem has been studied extensively [2,3,4,5]. This paper gives a more space-efficient data structure for answering any -approximate frequent item set query.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.