Automatic detection and correction of spelling errors in a large data base

Antonio Zamora

doi:10.1002/asi.4630310106

Abstract

AbstractOn‐line bibliographic search systems tend to increase the visibility of spelling errors through the use of indexes of unique terms; even low error rates in a data base can result in large numbers of misspelled terms in these indexes. This article describes the techniques used to detect and correct spelling errors in the data base of Chemical Abstracts Service. A computer program for spelling error detection achieves a high level of performance using hashing techniques for dictionary look‐up and compression. Heuristic procedures extend the dictionary and increase the proportion of misspelled words in the words flagged. Automatic correction procedures are applied only to words which are known to be misspelled; other corrections are performed manually during the normal editorial cycle. The constraints imposed on the selection of a spelling error detection technique by a complex data base, human factors, and high‐volume production are discussed.

Full Text