Abstract
This article describes a natural-language text-processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts — Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural-language biological titles into title-semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles against the system's Semantic Vocabulary (SV). The SV currently contains approximately 15,000 biological natural-language terms and their translations in the language of Concept Primitives. For the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, rules based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are described. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples. © 1987 John Wiley & Sons, Inc.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of the American Society for Information Science
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.