A dictionary retrieval algorithm using two trie structures

Katsushi Morimoto,Jun-Ichi Aoe,Hirokazu Iriguchi

doi:10.1002/scj.4690260209

Abstract

AbstractThe trie has the feature that the retrieval can be executed with the character symbols composing the key as the unit, and a high‐speed retrieval is realized independently of the total number of keys. Consequently, it used frequently in the search of the natural language dictionary and in other problems. A problem, however, is that the number of trie states increases with the enlargement of the key set, which necessitates a larger memory capacity. To remedy this point, DAWG (Directed Acyclic Word‐Graph) is proposed, where the common suffix of the tries is compressed.Then, a new problem arises in that the record information cannot be determined uniquely for the key. For this problem, this paper introduces a new structure, where the number of states is reduced by merging the common suffixes of the tries, while determining uniquely the record information for the key. The algorithm for retrieval, insertion and deletion of the key is proposed for the structure. In the proposed method, the set of keys is represented using two tries. One of the tries memorizes the prefix of the minimum length that can discriminate uniquely between the key from other keys. The other trie stores the suffixes of the remaining keys in order to merge the common suffix. A simulation is executed for various sets of keys such as Chinese character called Kanji, alphabets and Japanese Katakana characters, and it is seen that the number of states is reduced by approximately 30 to 65 percent for the key set of 50,000 words, compared to the ordinary trie.

Full Text