Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation

Irena Spasic

doi:10.1109/access.2018.2807122

Abstract

Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain-specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multiword terms from a domain-specific corpus. It uses a range of methods to normalize three types of term variation—orthographic, morphological, and syntactic variations. Acronyms, which represent a highly productive type of term variation, were not supported. In this paper, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. The main contribution of this paper is not acronym recognition per se, but rather its integration with other types of term variation into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 points, whereas index compression factor increased by 7% points. Therefore, evidence suggests that integration of acronyms provides nontrivial improvement of term conflation.

Highlights

Terms are linguistic representations of domain–specific concepts [1], [2]
APPLICATION CONTEXT The main goal of integrating acronym recognition into the multi–word term recognition process is to neutralize this type of term variation and its effects on term recognition
If we focus on recall as a way of comparing multiple systems against one another, it is worth noting that its denominator, i.e. the sum of true positive (TP) and false negative (FN), which equals the number of relevant document, is independent of the system and as such will remain constant across all systems

Summary

Introduction

Terms are linguistic representations of domain–specific concepts [1], [2]. For practical purposes, terms are often defined as noun phrases that frequently are mentioned in a domain– specific discourse [3], [4]. Termhood implies that terms carry heavier information load compared to other phrases used in a sublanguage, and as such they can be used to index and retrieve domain–specific documents, model domain– specific topics, identify text phrases useful for automatic summarization of domain–specific documents, identify slot fillers in information extraction, etc It is, essential to build and maintain terminologies in order to enhance the performance of many text mining applications [5]. Automatic term recognition (ATR) methods are needed to efficiently annotate electronic documents with a set of terms they mention One such method is FlexiTerm, which implements an unsupervised approach to extraction of multi–word terms from a domain–specific corpus [6]. It performs term recognition in three steps: 1. Lexico–syntactic filtering is used to select multi–word term candidates

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2018
Citations: 55	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

FlexiTerm: a flexible term recognition method.
Irena Spasić ... Mark Greenwood
Journal of Biomedical Semantics | VOL. 4
Irena Spasić, et. al.Irena Spasić ... Mark Greenwood
01 Jan 2013
Journal of Biomedical Semantics | VOL. 4

Is Regularization Uniform across Linguistic Levels? Comparing Learning and Production of Unconditioned Probabilistic Variation in Morphology and Word Order
Carmen Saldana ... Jennifer Culbertson
Language Learning and Development | VOL. 17
Carmen Saldana, et. al.Carmen Saldana ... Jennifer Culbertson
19 Feb 2021
Language Learning and Development | VOL. 17

Encoder-Attention based Automatic Term Recognition (EA-ATR)
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
01 Jan 2020
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Syntactically Diverse Adversarial Network for Knowledge-Grounded Conversation Generation
Fuwei Cui ... Ze Liu
-
Fuwei Cui, et. al.Fuwei Cui ... Ze Liu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access