Neural network recognition of spelling errors

Mark Lewellen

doi:10.3115/980691.980816

Abstract

One area in which artificial neural networks (ANNs) may strengthen NLP systems is in the identification of words under noisy conditions. In order to achieve this benefit when spelling errors or spelling variants are present, variable-length strings of symbols must be converted to ANN input/output form---fixed-length arrays of numbers. A common view in the neural network community has been that different forms of input/output representations have negligible effect on ANN performance. This paper, however, shows that input/output representations can in fact affect the performance of ANNs in the case of natural language words. Minimum properties for an adequate word representation are proposed, as well as new methods of word representation.To test the hypothesis that word representations significantly affect ANN performance, traditional and new word representations are evaluated for their ability to recognize words in the presence of four types of typographical noise: substitutions, insertions, deletions and reversals of letters. The results indicate that word representations have a significant effect on ANN performance. Additionally, different types of word representation are shown to perform better on different types of error.

Full Text