Abstract

Recently, the problem of finding frequent items in a data stream has been well studied. However, for some applications, such as HTTP log analysis, there is a need to analyze the correlations amongst frequent items in data streams. In this paper, we investigate the problem of finding correlated items based on the concept of unexpectedness. That is, two items x and y are correlated if both items are frequent and their actual number of co-occurrences in the data stream is significantly different from the expected value, which can be computed by the frequencies of x and y. Based on the Space-Saving algorithm [1], we propose a new one-pass algorithm, namely Stream-Correlation, to discover correlated item pairs. The key part of our algorithm is to efficiently estimate the frequency of co-occurrences of items with small memory space. The possible error can be tightly bounded by controlling the memory space. Experiment results show the effectiveness and the efficiency of the algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.