Abstract

AbstractThis paper presents MHeTRep, a multilingual medical terminology and the methodology followed for its compilation. The multilingual terminology is organised into one vocabulary for each language. All the terms in the collection are semantically tagged with a tagset corresponding to the top categories of Snomed-CT ontology. When possible, the individual terms are linked to their equivalent in the other languages. Even though many NLP resources and tools claim to be domain independent, their application to specific tasks can be restricted to specific domains, otherwise their performance degrades notably. As the accuracy of NLP resources drops heavily when applied in environments different from which they were built, a tuning to the new environment is needed. Usually, having a domain terminology facilitates and accelerates the adaptation of general domain NLP applications to a new domain. This is particularly important in medicine, a domain living moments of great expansion. The proposed method takes Snomed-CT as starting point. From this point and using 13 multilingual resources, covering the most relevant medical concepts such as drugs, anatomy, clinical findings and procedures, we built a large resource covering seven languages totalling more than two million semantically tagged terms. The resulting collection has been intensively evaluated in several ways for the involved languages and domain categories. Our hypothesis is that MHeTRep can be used advantageously over the original resources for a number of NLP use cases and likely extended to other languages.

Highlights

  • For many domain-restricted NLP tasks and applications, having semantically organised lexical resources is mandatory

  • Selecting the MeSH entries can be done straightforwardly but mapping the MeSH entries into our tagset is far from easy because of two issues: If we look at Table 6, it seems intuitive that some top MeSH headings can be directly mapped into Snomed-CT top categories

  • As the process of building MHeTRep includes two tasks: getting the terms and classifying them, both tasks have to be evaluated: a term is correct if it is a true term of the medical domain and it has been assigned to the correct category

Read more

Summary

Introduction

For many domain-restricted NLP tasks and applications, having semantically organised lexical resources (terminologies, lexicons, ontologies, etc.) is mandatory. The medical (and more generally, the health) domain includes all activities related to the diagnosis, treatment and prevention of disease, illness, injury and other physical and mental impairments in humans. Health care is delivered by practitioners in medicine, chiropractic, dentistry, nursing, pharmacy, allied health and other care providers. The health care industry is a sector that provides goods and services to treat patients with curative, preventive, rehabilitative or palliative care has a great importance and probably it is the most important domain. In the context of this paper, we consider that all documents related, either directly or indirectly, with in this field of activity constitute the medical domain. This paper aims to create a large collection of terms used in this domain in all types of publications and text genres

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.