Efficient regular expression matching on LZ77 compressed strings using negative factors

Yutong Han,Bin Wang,Tao Qiu,Xiaochun Yang,Huaijie Zhu

doi:10.1007/s11280-019-00667-z

Abstract

The state-of-the-art approaches for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as higher compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., the prefix and the suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant of suffix trie index, called SSLZ. We construct bitmaps for factors of regular expression to detect candidates. Moreover, due to the high space cost of SSLZ, we propose a variant index that partially maintain suffixes of the phrases with high frequency and develop an efficient regular expression algorithm based on the novel index, namely, RELZ+. In addition, two optimization strategies employing block filtering and LZ filtering are proposed to prune false negative candidates. At last, we conduct a comprehensive performance evaluation depending on four real data sets to validate our ideas and the proposed algorithms. The experimental results show that our RELZ and RELZ+ algorithms significantly outperform the existing algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient regular expression matching on LZ77 compressed strings using negative factors

Abstract

Talk to us

Similar Papers

More From: World Wide Web

Lead the way for us

Journal: World Wide Web	Publication Date: Mar 23, 2019
Citations: 4

Similar Papers

Efficient Regular Expression Matching on Compressed Strings
Yutong Han ... Xiaochun Yang
-
Yutong Han, et. al.Yutong Han ... Xiaochun Yang
01 Jan 2017
01 Jan 2017

Efficient Regular Expression Compression Algorithm for Deep Packet Inspection
Qian Xu ... Hua-Lin Qian
Journal of Software | VOL. 20
Qian Xu, et. al.Qian Xu ... Hua-Lin Qian
13 Nov 2009
Journal of Software | VOL. 20

GPU-based NFA implementation for memory efficient high speed regular expression matching
Yuan Zu ... Qunfeng Dong
-
Yuan Zu, et. al.Yuan Zu ... Qunfeng Dong
25 Feb 2012
25 Feb 2012

GPU-based NFA implementation for memory efficient high speed regular expression matching
Yuan Zu ... Ming Yang
ACM SIGPLAN Notices | VOL. 47
Yuan Zu, et. al.Yuan Zu ... Ming Yang
25 Feb 2012
ACM SIGPLAN Notices | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient regular expression matching on LZ77 compressed strings using negative factors

Abstract

Talk to us

Similar Papers

More From: World Wide Web