Abstract
Natural Language Processing (NLP) applications on real-life textual content require a suitable fit for purpose corpora, which can accommodate the ambiguity of the domain. Researchers in the field managed to synthesize gold-standard corpora in many domains and for varying tasks, assisted by domain experts and linguists. The wealth of information buried in free-text electronic documents in healthcare systems presents itself as a leading contender for NLP applications. In this literature review, the efforts to come up and utilize a clinically annotated corpus in a particular healthcare information extraction task are explored. Those efforts can be more pronounced when done on a new language with limited existing gold-standard clinical references. A great number of people around the globe interact with healthcare systems in languages other than English. Advancing the Clinical NLP research in their languages will propel the general progress in the field and potential healthcare advantages considerably. For the purposes of this review, we considered three major world languages: Spanish, Italian, and Chinese. This led to considering the research question to be about the viability of the creation or utilization of a gold-standard clinical corpus in a language other than English and how it can contribute in performing a complex clinical language mining task. The implementations reviewed in these languages considered varying approaches to overcome complexities in biomedical NLP in these languages. This study managed to highlight novel solutions to complex tasks and found that efforts in these languages can be highly successful if a non-English medical corpus is created from scratch, off-the-shelf tools are used or machine translation is considered to bridge the gap in biomedical NLP domain-specific lingual resources in these languages.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.