Disambiguation by short contexts

Yaacov Choueka,Serge Lusignan

doi:10.1007/bf02259530

Abstract

This paper describes a technique that we believe can be of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope. As a background we shall present in the following sections some fundamental clarifications and remarks on our specific view of lemmatization and disambiguation. Our starting point is the double assertion that we believe would be shared by many workers in applied computational linguistics and large text-processing projects, to wit: that on the one hand lemmatization is one of the most important and crucial steps in many non-trivial text-processing cycles, but on the other hand, no operational, reasonably general, fully automatic and high-quality context-sensitive text-lemmatization system nowadays is easily accessible for any natural language. Given these two premises, the problem is how to introduce a partial element (at least) of machineaided work in the process of text-lemmatization, so as to avoid the extremely laborious and frustrating task of a word-per-word manual lemmatization of large corpora as was done in the early days of automatic text-processing projects. (For a thorough report on mechanical lemmatization programs, see ref. 4.) In this paper we focus on the analysis and experimental testing of one idea that fits naturally into this framework, namely that of disambiguation by short contexts. (The somewhat unexpected shift from "lemmatization" to "disambiguation" will be justified in the sections to come.) Based on

Full Text