Abstract

Abstract Reliably tracking large network flows in order to determine so-called elephant flows, also known as heavy hitters or frequent items, is a common data mining task. Indeed, this kind of information is crucial in many different contexts and applications. Since storing all of the traffic is impossible, owing to the fact that flows arrive as an unbounded, infinite length stream, many different algorithms have been designed for traffic measurement using only a limited amount of memory. CountMax is a recently published algorithm that solves this problem approximately by combining ideas from the CountMin and MisraGries algorithms. In this paper we introduce CMSS which cleverly combines ideas from the CountMin and Space Saving algorithms. We compare CMSS and CountMax on both synthetic and real datasets. The experimental results show that both algorithms achieve 100% recall, and can retrieve the frequent items without false negatives even with a limited amount of budgeted memory. Regarding the precision, CMSS proves to be robust and independent from all of the parameters, whilst CountMax is severely affected by false positives. Finally, both algorithms provide, in practice, the same frequency estimation quality. It follows that CMSS outperforms CountMax with regard to overall accuracy whilst providing the same frequency estimation quality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call