동일한 입력 문자를 가지는 상태의 병합을 통한 메모리 효율적인 결정적 유한 오토마타 구현

Yoon-Ho Choi

doi:10.13089/jkiisc.2013.23.3.395

Abstract

패턴 정합 알고리듬은 침입 탐지 및 방지 시스템의 성능을 좌우하는 중요한 기능 요소로서 일반적으로 정규 표현식(Regualr Expressions)을 사용해 패턴을 표현한다. 공격 패턴이 복잡해지고 다양해짐에 따라, 정규 표현식 또한 복잡해지고 그 수가 증가하고 있으며 이로 인해, 패턴 매칭 알고리듬에서 정규 표현식을 인식하기 위해 사용된 결정적 유한 오토마타(Deterministic Finite Automata)를 구성하는 상태가 폭발적으로 증가(states blowup)하고 있다. 이러한 상태의 폭발적 증가 문제를 해결하고 메모리 효율적인 자료 구조를 구현하기 위해 많은 연구가 이루어졌다. 대부분의 연구 결과들에서는 하나의 정규 표현식을 변환한 결정적 유한 오토마톤(Automaton) 내 상태의 수를 감소시키기 위한 효과적인 방안들을 제안하였다. 하지만, 이들 연구 결과는 단일 패턴 내 상태의 수만을 감소시킬 뿐 패턴의 수에 따라 증가하는 상태의 수를 감소시키지 못하는 한계점을 가지고 있다. 본 논문에서는 이를 해결하기 위해 정규 표현식으로 구성된 유한 오토마타(Automata) 상의 상태 병합을 통한 상태 감소 방안을 제안한다. 이는 동일한 입력 문자를 가지는 상태를 병합함으로써 유한 오토마타 상의 상태의 수를 감소시켜, 기존 결정적 유한 오토마타에 비해 평균 40.0%의 메모리 감소 효과를 나타낸다. A pattern matching algorithm plays an important role in traffic identification and classification based on predefined patterns for intrusion detection and prevention. As attacks become prevalent and complex, current patterns are written using regular expressions, called regexes, which are expressed into the deterministic finite automata(DFA) due to the guaranteed worst-case performance in pattern matching process. Currently, because of the increased complexity of regex patterns and their large number, memory-efficient DFA from states reduction have become the mainstay of pattern matching process. However, most of the previous works have focused on reducing only the number of states on a single automaton, and thus there still exists a state blowup problem under the large number of patterns. To solve the above problem, we propose a new state compression algorithm that merges states on multiple automata. We show that by merging states with the same input character on multiple automata, the proposed algorithm can lead to a significant reduction of the number of states in the original DFA by as much as 40.0% on average.

Full Text