Fast string correction with Levenshtein automata

Klaus U Schulz,Stoyan Mihov

doi:10.1007/s10032-002-0082-8

Abstract

The Levenshtein distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein automata of degree n for a word W are defined as finite state automata that recognize the set of all words V where the Levenshtein distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein automaton of degree n for W in time linear to the length of W. Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein automata and leads to even improved efficiency. Evaluation results are given that also address variants of both methods that are based on modified Levenshtein distances where further primitive edit operations (transpositions, merges and splits) are used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast string correction with Levenshtein automata

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition

Lead the way for us

Journal: International Journal on Document Analysis and Recognition	Publication Date: Nov 1, 2002
Citations: 163

Similar Papers

Text disambiguation by finite state automata, an algorithm and experiments on corpora
Emmanuel Roche
-
Emmanuel RocheEmmanuel Roche
01 Jan 1992
01 Jan 1992

SetExp: a method of transformation of timed automata into finite state automata
Lucien Ouedraogo ... Mustapha Nourelfath
Real-Time Systems | VOL. 46
Lucien Ouedraogo, et. al.Lucien Ouedraogo ... Mustapha Nourelfath
03 Aug 2010
Real-Time Systems | VOL. 46

Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method
Viny Christanti Mawardi ... Niko Susanto
MATEC Web of Conferences | VOL. 164
Viny Christanti Mawardi, et. al.Viny Christanti Mawardi ... Niko Susanto
01 Jan 2018
MATEC Web of Conferences | VOL. 164

Transliteration of Latin Letters to Bali Characters Based on Unicode for Mobile Devices using Finite State Automata and Levenshtein Distance
I Made Subrata Sandhiyasa ... Luh Fibriyanthini
sinkron | VOL. 8
I Made Subrata Sandhiyasa, et. al.I Made Subrata Sandhiyasa ... Luh Fibriyanthini
04 Aug 2024
sinkron | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast string correction with Levenshtein automata

Abstract

Talk to us

Similar Papers

More From: International Journal on Document Analysis and Recognition