As a core technique of deep packet inspection, regular expression matching effectively extracts valuable contents from network traffic. Its performance is crucial to network security and big data applications. However, mobile internet brings massive network traffic, and most are compressed for a better user experience, which challenges the performance of matching compressed data. The present works focus on the LZ77 compressed data rather than the new excellent hybrid dictionary-based compression data. Vcdiff, as the representative scheme, combines a static shared dictionary and a self-adaptive dynamic dictionary to ensure better compression density. It also exhibits excellent performance on compression and decompression speed. However, accelerating matching Vcdiff compressed data is only suitable for the string matching scenarios and fails to show an impressive performance. In this paper, we propose ERASER to accelerate regular expression matching over hybrid dictionary-based compressed data, which is also the first approach to accelerate regular expression matching over Vcdiff compressed data. The experiments show that ERASER can double the speed of the existing methods on matching Vcdiff compressed data. It also remarkably improves preprocessing and matching performance, which is better than the state-of-the-art approach over LZ77 compressed data. We also propose a model to explore the correlation between the matching speed and the compression density. The experiments demonstrate that the matching speed of a method will increase with the better compression density of compressed data.
Read full abstract