Abstract

This paper describes a technique that we believe can be of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope. As a background we shall present in the following sections some fundamental clarifications and remarks on our specific view of lemmatization and disambiguation. Our starting point is the double assertion that we believe would be shared by many workers in applied computational linguistics and large text-processing projects, to wit: that on the one hand lemmatization is one of the most important and crucial steps in many non-trivial text-processing cycles, but on the other hand, no operational, reasonably general, fully automatic and high-quality context-sensitive text-lemmatization system nowadays is easily accessible for any natural language. Given these two premises, the problem is how to introduce a partial element (at least) of machineaided work in the process of text-lemmatization, so as to avoid the extremely laborious and frustrating task of a word-per-word manual lemmatization of large corpora as was done in the early days of automatic text-processing projects. (For a thorough report on mechanical lemmatization programs, see ref. 4.) In this paper we focus on the analysis and experimental testing of one idea that fits naturally into this framework, namely that of disambiguation by short contexts. (The somewhat unexpected shift from "lemmatization" to "disambiguation" will be justified in the sections to come.) Based on

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.