Abstract

Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain-specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multiword terms from a domain-specific corpus. It uses a range of methods to normalize three types of term variation—orthographic, morphological, and syntactic variations. Acronyms, which represent a highly productive type of term variation, were not supported. In this paper, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. The main contribution of this paper is not acronym recognition per se, but rather its integration with other types of term variation into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 points, whereas index compression factor increased by 7% points. Therefore, evidence suggests that integration of acronyms provides nontrivial improvement of term conflation.

Highlights

  • Terms are linguistic representations of domain–specific concepts [1], [2]

  • APPLICATION CONTEXT The main goal of integrating acronym recognition into the multi–word term recognition process is to neutralize this type of term variation and its effects on term recognition

  • If we focus on recall as a way of comparing multiple systems against one another, it is worth noting that its denominator, i.e. the sum of true positive (TP) and false negative (FN), which equals the number of relevant document, is independent of the system and as such will remain constant across all systems

Read more

Summary

Introduction

Terms are linguistic representations of domain–specific concepts [1], [2]. For practical purposes, terms are often defined as noun phrases that frequently are mentioned in a domain– specific discourse [3], [4]. Termhood implies that terms carry heavier information load compared to other phrases used in a sublanguage, and as such they can be used to index and retrieve domain–specific documents, model domain– specific topics, identify text phrases useful for automatic summarization of domain–specific documents, identify slot fillers in information extraction, etc It is, essential to build and maintain terminologies in order to enhance the performance of many text mining applications [5]. Automatic term recognition (ATR) methods are needed to efficiently annotate electronic documents with a set of terms they mention One such method is FlexiTerm, which implements an unsupervised approach to extraction of multi–word terms from a domain–specific corpus [6]. It performs term recognition in three steps: 1. Lexico–syntactic filtering is used to select multi–word term candidates

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.