Commodity Platforms Research Articles

As both new network attacks emerge and network traffic increases in volume, the need to perform network traffic inspection at high rates is ever increasing. The core of many security applications that inspect network traffic (such as Network Intrusion Detection) is pattern matching. At the same time, pattern matching is a major performance bottleneck for those applications: indeed, it is shown to contribute to more than 70% of the total running time of Intrusion Detection Systems. Although numerous efficient approaches to this problem have been proposed on custom hardware, it is challenging for pattern matching algorithms to gain benefit from the advances in commodity hardware. This becomes even more relevant with the adoption of Network Function Virtualization, that moves network services, such as Network Intrusion Detection, to the cloud, where scaling on commodity hardware is key for performance.In this paper, we tackle the problem of pattern matching and show how to leverage the architecture features found in commodity platforms. We present efficient algorithmic designs that achieve good cache locality and make use of modern vectorization techniques to utilize data parallelism within each core. We first identify properties of pattern matching that make it fit for vectorization and show how to use them in the algorithmic design. Second, we build on an earlier, cache-aware algorithmic design and show how we apply cache-locality combined with SIMD gather instructions to pattern matching. Third, we complement our algorithms with an analytical model that predicts their performance and that can be used to easily evaluate alternative designs. We evaluate our algorithmic design with open data sets of real-world network traffic: Our results on two different platforms, Haswell and Xeon-Phi, show a speedup of 1.8x and 3.6x, respectively, over Direct Filter Classification (DFC), a recently proposed algorithm by Choi et al. for pattern matching exploiting cache locality, and a speedup of more than 2.3x over Aho–Corasick, a widely used algorithm in today’s Intrusion Detection Systems. Finally, we utilize highly parallel hardware platforms, evaluate the scalability of our algorithms and compare it to parallel implementations of DFC and Aho–Corasick, achieving processing throughput of up to 45Gbps and close to 2 times higher throughput than Aho–Corasick.

Read full abstract

SUMMARY The representation tree lies at the heart of the algorithm of Multiple Relatively Robust Representations for computing orthogonal eigenvectors of a symmetric tridiagonal matrix without Gram–Schmidt. A representation tree describes the incremental shift relations between relatively robust representations of eigenvalue clusters of an unreduced tridiagonal matrix, which are needed to strongly separate close eigenvalues in the relative sense. At the bottom of the representation tree, each leaf defines a relatively isolated eigenvalue to high relative accuracy. The shape of the representation tree plays a pivotal role for complexity and available parallelism: a deeper tree consisting of multiple levels of nodes involves tasks associated to more work (i.e., eigenvalue refinement to resolve eigenvalue clusters) and less parallelism (i.e., a longer critical path as well as potential data movement and synchronization). An embarrassingly parallel, ideal tree on the other hand consists of a root and leaves only. As highly parallel hybrid graphics processing unit/multicore platforms with large memory now become available as commodity platforms, exploiting parallelism in traditional algorithms becomes key to modernizing the components of standard software libraries such as LAPACK. This paper focuses on LAPACK's Multiple Relatively Robust Representations algorithm and investigates the critical case where a representation tree contains a long sequential chain of large (fat) nodes that hamper parallelism. This key problem needs to be addressed as it concerns all sorts of computing environments, distributed computing, symmetric multiprocessor, as well as hybrid graphics processing unit/multicore architectures. We present an improved representation tree that often offers a significantly shorter critical path and finer computational granularity of smaller tasks that are easier to schedule. In a study of selected synthetic and application matrices, we show that an average 75% reduction in the length of the critical path and 82% reduction in task granularity can be achieved. Copyright © 2011 John Wiley & Sons, Ltd.

Read full abstract

Commodity Platforms Research Articles

Articles published on Commodity Platforms

Coordination of interests of participants in digital commodity platforms (marketplaces): legal problems

Mediating queer masculinities through alternative music from Palestine

Practical Application of Platform Comment Sentiment Binary Classification Based on Deep Learning

Orchestrating Energy-Efficient vRANs: Bayesian Learning and Experimental Results

PISCOT: A Pipelined Split-Transaction COTS-Coherent Bus for Multi-Core Real-Time Systems

Interactive spatio-temporal exploration of massive time-Varying rectilinear scalar volumes based on a variable bit-rate sparse representation over learned dictionaries

Multiple pattern matching for network security applications: Acceleration through vectorization

A scalable and accurate distributed traffic generator with Fourier transformed distribution over multiple commodity platforms

Enif-Lang: A Specialized Language for Programming Network Functions on Commodity Hardware

Achieving One Billion Key-Value Requests per Second on a Single Server

A parallel min-cut algorithm using iteratively reweighted least squares targeting at problems with floating-point edge weights

NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms

Design and optimizations for efficient regular expression matching in DPI systems

State‐of‐the‐Art in Compressed GPU‐Based Direct Volume Rendering

Real-Time Autonomous Structural Change Detection Onboard Wireless Sensor Platforms

Scaling LAPACK panel operations using parallel cache assignment

Deep packet inspection tools and techniques in commodity platforms: Challenges and trends

COVRA: A compression‐domain output‐sensitive volume rendering architecture based on a sparse representation of voxel blocks

A note on generating finer-grain parallelism in a representation tree

Virtual machine monitors: current technology and future trends

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Commodity Platforms Research Articles

Articles published on Commodity Platforms

Coordination of interests of participants in digital commodity platforms (marketplaces): legal problems

Mediating queer masculinities through alternative music from Palestine

Practical Application of Platform Comment Sentiment Binary Classification Based on Deep Learning

Orchestrating Energy-Efficient vRANs: Bayesian Learning and Experimental Results

PISCOT: A Pipelined Split-Transaction COTS-Coherent Bus for Multi-Core Real-Time Systems

Interactive spatio-temporal exploration of massive time-Varying rectilinear scalar volumes based on a variable bit-rate sparse representation over learned dictionaries

Multiple pattern matching for network security applications: Acceleration through vectorization

A scalable and accurate distributed traffic generator with Fourier transformed distribution over multiple commodity platforms

Enif-Lang: A Specialized Language for Programming Network Functions on Commodity Hardware

Achieving One Billion Key-Value Requests per Second on a Single Server

A parallel min-cut algorithm using iteratively reweighted least squares targeting at problems with floating-point edge weights

NetVM: High Performance and Flexible Networking Using Virtualization on Commodity Platforms

Design and optimizations for efficient regular expression matching in DPI systems

State‐of‐the‐Art in Compressed GPU‐Based Direct Volume Rendering

Real-Time Autonomous Structural Change Detection Onboard Wireless Sensor Platforms

Scaling LAPACK panel operations using parallel cache assignment

Deep packet inspection tools and techniques in commodity platforms: Challenges and trends

COVRA: A compression‐domain output‐sensitive volume rendering architecture based on a sparse representation of voxel blocks

A note on generating finer-grain parallelism in a representation tree

Virtual machine monitors: current technology and future trends