Exploring efficient grouping algorithms in regular expression matching.

Chengcheng Xu,Jinshu Su,Shuhui Chen

doi:10.1371/journal.pone.0206068

Chengcheng Xu, Jinshu Su + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0206068

Copy DOI

Journal: PLOS ONE	Publication Date: Oct 24, 2018
Citations: 5	License type: CC BY 4.0

Affiliation: National University of Defense Technology

Abstract

BackgroundRegular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, thus to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA.ObjectiveFor practical implementation, the total number of DFA states should be as few as possible, thus the data structures of these DFAs can be deployed on fast on-chip memories for rapid access. In addition, to support fast pattern update in some applications, the time cost for grouping should be as small as possible. In this study, we aimed to propose an efficient grouping method, which generates as few states as possible with as little time overhead as possible.MethodsWhen compiling multiple patterns into a single DFA, the number of DFA states is usually greater than the total number of states when compiling each pattern to a separate DFA. This is mainly caused by the semantic overlaps among different rules. By quantifying the interaction values for each pair of rules, the rule grouping problem can be reduced to the maximum k-cut graph partitioning problem. Then, we propose a heuristic algorithm called the one-step greedy (OSG) algorithm to solve this NP-hard problem. What’s more, a subroutine named the heuristic initialization (HI) algorithm is devised to further optimize the grouping algorithms.ResultsWe employed three practical rule sets for the experimental evaluation. Results show that the OSG algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of DFA states and time cost for grouping. The HI subroutine also demonstrates its significant optimization effect on the grouping algorithms.ConclusionsThe DFA state explosion problem has became the most challenging issue in the regular expression matching applications. Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups. In this paper, we investigate the current grouping solutions, and propose a compact and efficient grouping algorithm. Experiments conducted on practical rule sets demonstrate the superiority of our proposal.

Highlights

Modern network services increasingly rely on the processing of stream payloads
Results show that the one-step greedy (OSG) algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of deterministic finite automata (DFA) states and time cost for grouping
Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups

Summary

Background

Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. The regular expression patterns need to be converted to a deterministic finite automata (DFA). With the everincreasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA

Objective

Methods

Results

Conclusions

Introduction

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring efficient grouping algorithms in regular expression matching.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

A De-Compositional Approach to Regular Expression Matching for Network Security
Alex X Liu ... Eric Norige
IEEE/ACM Transactions on Networking | VOL. 27
Alex X Liu, et. al.Alex X Liu ... Eric Norige
01 Dec 2019
IEEE/ACM Transactions on Networking | VOL. 27

RICS-DFA: Reduced Input Character Set DFA for Memory-Efficient Regular Expression Matching
Qiu Tang ... Lei Jiang
-
Qiu Tang, et. al.Qiu Tang ... Lei Jiang
01 Jan 2015
01 Jan 2015

Scalable regular expression matching on data streams
Anirban Majumder ... Sriram Vanama
-
Anirban Majumder, et. al.Anirban Majumder ... Sriram Vanama
09 Jun 2008
09 Jun 2008

Fast Regular Expression Matching Using Small TCAM
Chad R Meiners ... Alex X Liu
IEEE/ACM Transactions on Networking | VOL. 22
Chad R Meiners, et. al.Chad R Meiners ... Alex X Liu
01 Feb 2014
IEEE/ACM Transactions on Networking | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring efficient grouping algorithms in regular expression matching.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE