Abstract

For a high-speed network, it is an important task to process the IP packet stream using limited memory and measure its statistical metrics of interest. While many algorithms have been proposed to estimate the cardinality of a single data stream (i.e., the number of distinct elements), it remains a great challenge when a stream contains numerous sub-streams, called flows. In this paper, we focus on a problem of designing a generic data structure to measure multiple types of per-flow statistics in a high-speed stream, including per-flow cardinality, top-K super-spreading flows with the greatest cardinalities, per-flow cardinality moments and per-flow cardinality distribution. Previous solutions for generic measurement mainly focus on the frequency-related statistics measurement, while this paper makes a step forward to support deduplication, i.e., cardinality-related measuring. To address this new problem, we propose a generic sketch named M2D. The challenge is that the per-flow cardinality distribution is often highly skewed with a small proportion of super-spreaders. To tame the skewness, we adopt the adjustable progressive sampling technique, which samples subsets of flows by an exponentially decreasing probability according to their cardinalities. Based on the sampled super-spreaders, we estimate the moments of per-flow cardinalities with different orders. We finally apply the method of moments to reconstruct the per-flow cardinality distribution with no priori knowledge about its formula. Our experiments show M2D’s high memory efficiency (average savings of 38%) and satisfactory distribution estimation accuracy (2% to 98% improvement) than other algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call