Abstract

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.

Highlights

  • Parsing is the process of analysing if a string of symbols conforms to a given set of rules

  • In this work we are interested in the parsing problem for regular languages (RLs) [1], i.e. languages that can be recognized by deterministic finite automata and equivalent formalisms

  • Regular expressions (REs) syntax is defined by the following context-free grammar e ::= ∅ | | a | e e | e + e | e where a is any symbol from the underlying alphabet

Read more

Summary

Introduction

Parsing is the process of analysing if a string of symbols conforms to a given set of rules. Regular expressions (REs) are an algebraic and compact way of specifying RLs that are extensively used in lexical analyser generators [2] and string search utilities [3] Since such tools are widely used and parsing is pervasive in computing, there is a growing interest on certified parsing algorithms [4,5,6]. We provide a complete formalization of an algorithm for RE parsing using derivatives [8], and describe a RE based search tool we developed by using the dependently typed language Agda [11]. All details can be found in the source code available at [12]

An Overview of Agda
Regular Expressions
Smart Constructors
Brzozowski Derivatives and their Properties
Antimirov’s Partial Derivatives and its Properties
Parsing
Implementation Details and Experiments
Related Work
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call