System design for detection and correction of spelling errors in scientific and scholarly text

Joseph J Pollock,Antonio Zamora

doi:10.1002/asi.4630350206

Abstract

AbstractThe SPEEDCOP project recently completed at Chemical Abstracts Service (CAS) extracted over 50,000 misspellings from approximately 25,000,000 words of text from seven scientific and scholarly databases. The misspellings were automatically classified and analyzed and the results used to design and implement a program that proved capable of correcting most such errors. Analysis of the performance of the spelling error detection and correction programs highlighted the features that should be incorporated into a powerful and user‐friendly interactive system suitable for nonprogram‐mers. These include document level thresholds for misspelling detection, automatic reuse of user decisions, and user verification and control of correction. An advantage of the proposed design is that the system automatically customizes itself to its environment. This article is primarily concerned with system design, not implementation details.

Full Text