Abstract

Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability. We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) Limited Annotated Documents: The training data comprises only a small set (approximately 100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually Parallel: The constructed corpora are parallel in Japanese and English. (3) Practical Tasks: The workshop addresses fundamental tasks, such as named entity recognition and applied practical tasks. We propose three tasks: named entity recognition (NER) of approximately 100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effects (ADE) detection for CRs and identical case identification (CI) for RRs. Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50-70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy. Most systems adopt medical-domain-specific pre-trained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached approximately 0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.