Abstract

The concept of edit distance and its variants has applications in many areas such as computational linguistics, bioinformatics, and synchronization error detection in data communications. Here, we revisit the problem of computing the inner edit distance of a regular language given via a Nondeterministic Finite Automaton (NFA). This problem relates to the inherent maximal error-detecting capability of the language in question. We present two efficient algorithms for solving this problem, both of which execute in time O ( r 2 n 2 d ) , where r is the cardinality of the alphabet involved, n is the number of transitions in the given NFA, and d is the computed edit distance. We have implemented one of the two algorithms and present here a set of performance tests. The correctness of the algorithms is based on the connection between word distances and error detection and the fact that nondeterministic transducers can be used to represent the errors (resp., edit operations) involved in error-detection (resp., in word distances).

Highlights

  • IntroductionThe concept of edit distance and its variants has applications in many areas such as computational linguistics [1], bioinformatics [2], and synchronization error detection in data communications [3]

  • The concept of edit distance and its variants has applications in many areas such as computational linguistics [1], bioinformatics [2], and synchronization error detection in data communications [3].The edit distance of a language L with at least two words— referred to as inner edit distance of L—is the minimum edit distance between any two different words in L

  • We present two efficient algorithms to compute the inner edit distance of a regular language given via an Nondeterministic Finite Automaton (NFA) with n transitions—see Theorems 1 and 3

Read more

Summary

Introduction

The concept of edit distance and its variants has applications in many areas such as computational linguistics [1], bioinformatics [2], and synchronization error detection in data communications [3]. O(n5 ) for DFAs, and O(n8 ) for NFAs. In this paper, we present two efficient algorithms to compute the inner edit distance of a regular language given via an NFA with n transitions—see Theorems 1 and 3. We present two efficient algorithms to compute the inner edit distance of a regular language given via an NFA with n transitions—see Theorems 1 and 3 Both algorithms, which are called DistErrDetect and DistInpAlter, have the same worst-case time complexity. We only consider the channel sid(k), for some k ∈ N, such that (u, v) ∈ sid(k) if and only if v can be obtained by applying at most k errors in u, where an error could be a deletion of a symbol in u, a substitution of a symbol in u with another symbol, or an insertion of a symbol in u—see further below for a more rigorous definition via edit-strings

NFAs and Transducers
Edit Strings and Edit Distance
Edit Distance via Error-Detection
Let Ba be the edit distance bound in Lemma 2
An Input-Altering Transducer for Edit-Distance
Let Ba be the bound in Lemma 2
Construct the transducer ia1 —see Figure 2
Implementation and Testing
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call