HIERARCHICALLY CODED LEXICON WITH VARIANTS

François De Bertrand De Beuvron,Philippe Trigano

doi:10.1142/s0218001495000080

Abstract

This paper describes a new lexicon organization intended to provide high speed string correction. This method links two different approaches: code based partition of the lexicon using a hierarchical file for the first part,21 and generation of neighboring strings or code for the second.3,18 The lexicon is preprocessed using a hierarchy of codes. This preprocessing can be done incrementally as each new word is entered. The resulting size of the lexicon remains proportional to the number of words entered with a rather high proportionality factor. Given a new string, it is then possible to find a set of candidate corrections in constant time. This set contains at least all the desired neighbors (in a sense defined by the user) of the given string. This method is therefore well-suited for a small to medium sized lexicon in applications where the correction speed is crucial (the method has been successfully tested with a French lexicon of about 10,000 words collected from scientific texts).

Full Text