Abstract
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.
Highlights
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data
Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals
A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine
Summary
Biomédicas. bCirujano Dentista, Magíster en Informática Médica. cEstudiante de Medicina, Universidad de Chile. dMédico Cirujano, Estudiante de Magíster en Informática Médica. eFísica, PhD en Matemática Aplicada y Física Teórica. DMédico Cirujano, Estudiante de Magíster en Informática Médica. EFísica, PhD en Matemática Aplicada y Física Teórica. PBB es financiado por el proyecto U-INICIA VID 2019 UI-004/19 centro de costo 7487 y del Proyecto ICM P09-015F. JD, FV y MD reciben financiamiento del CMM-ANID AFB 170001. FV y JD reciben apoyo del centro de costos 570111 - CIMTCORFO. Las organizaciones que financiaron este trabajo no tuvieron influencia en el diseño del estudio; en la recolección, análisis o interpretación de los datos ni en la preparación, revisión o aprobación del manuscrito. Recibido el 7 abril de 2020, aceptado el 28 de abril de 2021. PABLO BÁEZ1,a, FABIÁN VILLENA1,2,b, KAREN ZÚÑIGA3,c, NATALIA JONES3,c, GUSTAVO FERNÁNDEZ3,c, MANUEL DURÁN1,d, JOCELYN DUNSTAN1,2,e
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.