CONCATENATION AND KLEENE STAR ON DETERMINISTIC FINITE AUTOMATA

Guo-Qiang Zhang,Xiangnan Zhou,Licong Cui,Robert Fraser

doi:10.1142/9789814401531_0056

Abstract

This paper presents direct, explicit algebraic constructions of concatenation and Kleene star on deterministic finite automata (DFA), using the Boolean-matrix method of Zhang [5] and ideas of Kozen [2]. The consequence is trifold: (1) it provides an alternative proof of the classical Kleene’s Theorem on the equivalence of regular expressions and DFAs without using nondeterministic finite automata (NFA); (2) it demonstrates how the language constructions of concatenation and Kleene star can be captured elegantly as algebraic laws in the form of “binomial theorems;” (3) it provides a demonstration of the (tight) upper bounds of the state complexity of concatenation and Kleene star, but offers a way to study the state complexity of NFA also. I. MATRIX-APPROACH TO AUTOMATA THEORY A Boolean matrix is a matrix (of size m×n) whose elements are either 0 or 1, where the internal operations are carried out over the Boolean algebra. We write Bm×n for the set of all Boolean matrices of size m × n. A Boolean (row) vector of dimension n is an n-tuple (b1, b2, . . . , bn) of 0s and 1s. We write Bn for the set of all Boolean vectors of dimension n. A column vector is the transpose ( ) of a row vector. The characteristic vector of a subset A of {1, · · · , n} is the row vector IA ∈ Bn such that the p-th component of IA is a 1 if and only if p ∈ A. The characteristic vector of a singleton set {p} is written as Ip , or simply Ip. Om×n stands for an (m× n)-matrix, all of its elements are 0. When dimension is fixed by context, we abuse notion and write On×n as 0. A deterministic finite automaton (DFA) is a 5-tuple M = (Q,Σ, δ, q0, F ), where Q is the finite set of states, Σ is the alphabet, δ : Q × Σ → Q is the transition function, q0 is the start state, and F is the set of final states. For notational convenience, we use initial segments of natural numbers {1, 2, · · · , n} to denote the set of states, and fix 1 to be the start state, for base/background DFAs. When there is no confusion, we omit the indication of the start state (which is assumed to be state 1 by default). Each n-state DFA determines a (associated) matrix system {∆ | a ∈ Σ}, where ∆ is the (n × n) adjacency matrix of the a-labeled subgraph associated with the DFA. In other words, the (i, j) entry of ∆ is 1 if and only if δ(i, a) = j. Since M is a DFA, each ∆ is row-stochastic (i.e., every row contains precisely a single 1). The (Boolean) sum ∆ of all members ∆ in the matrix system is the adjacency matrix. For a string w = a1a2 · · · an over Σ, we write ∆ for the matrix product ∆1∆2 · · ·∆n . The language accepted by M , denoted L(M), is the set {w | Iq0∆IF = 1}. We refer more details of the utility of this matrix approach to [5]. Example 1.1: The matrix system of the following DFA is {( 0 1

Full Text