Abstract

This chapter discusses the problem of recognizing multiword units (MWUs) in medical domain texts written in the Croatian language. MWUs have been the focus of research of many authors since even before the Natural Language Processing era, which has only helped to spread interest in MWUs in multiple dimensions and directions. An overview of rule-based approaches to different levels of analysis of medical-related texts, ranging from simple regular expressions to commercial healthcare-domain-oriented tools like ClearForest, LEXIMER, and AeroText, among others, is given in Spasic et al. Health care is abundant in free-form medical texts, which are also almost impossible to obtain, even for research purposes. The creation of this lexicon is an ongoing project divided into several phases. In previous phases, the Croatian medical corpus was collected, and it is now continuously being made available through the Sketch Engine interface as the documents are tagged with the domain and subdomain markers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.