Abstract

Packet classification is primarily used by network devices, such as routers and firewalls, to do additional processing such as packet filtering, and Quality-of-Service (QoS) for a specific subset of network packets. In decision tree based packet classification system, packets are classified by searching in the tree data structure. Tree search presents significant challenges because it requires a number of unpredictable and irregular memory accesses. Since packet classification is per-packet operation and memory latency (caused by cache and TLB misses) is considerably high, any technique that can reduce cache and TLB misses can be useful in practice for improving lookup time in packet classification. In this paper, we present an efficient memory layout for the tree data structure which ensures the movement of data optimally among the different levels of the memory hierarchy on general purpose processors. In particular, for a given node size, the number of accessed cache lines (and memory pages) is minimized by our proposed memory layout resulting in less number of cache and TLB misses. This reduction directly contributes in improving the look up performance. The decision tree laid out in the proposed layout can also exploit the strong computing power of multi-core architecture by leveraging data- and thread-level parallelism. Experimental results on two different state-of-the-art processors show that significant performance improvements (40–55% faster) and near-linear speedup (3.8× on quad cores) on multi-core architecture is achievable by applying our proposed memory layout for the packet classification tree data structure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call