Abstract
We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.
Highlights
Parsing is the process of analysing if a string of symbols conforms to a given set of rules
In this work we are interested in the parsing problem for regular languages (RLs) [1], i.e. languages that can be recognized by deterministic finite automata and equivalent formalisms
Regular expressions (REs) syntax is defined by the following context-free grammar e ::= ∅ | | a | e e | e + e | e where a is any symbol from the underlying alphabet
Summary
Parsing is the process of analysing if a string of symbols conforms to a given set of rules. Regular expressions (REs) are an algebraic and compact way of specifying RLs that are extensively used in lexical analyser generators [2] and string search utilities [3] Since such tools are widely used and parsing is pervasive in computing, there is a growing interest on certified parsing algorithms [4,5,6]. We provide a complete formalization of an algorithm for RE parsing using derivatives [8], and describe a RE based search tool we developed by using the dependently typed language Agda [11]. All details can be found in the source code available at [12]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have