Embedding-based terminology expansion via secondary use of large clinical real-world datasets

Amila Kugic,Bastian Pfeifer,Stefan Schulz,Markus Kreuzthaler

doi:10.1016/j.jbi.2023.104497

Amila Kugic, Bastian Pfeifer + Show 2 more

Open Access

https://doi.org/10.1016/j.jbi.2023.104497

Copy DOI

Journal: Journal of Biomedical Informatics	Publication Date: Sep 29, 2023
Citations: 1	License type: cc-by-nc-nd

Affiliation: Medical University of Graz

Abstract

A log-likelihood based co-occurrence analysis of ∼1.9 million de-identified ICD-10 codes and related short textual problem list entries generated possible term candidates at a significance level of p<0.01. These top 10 term candidates, consisting of 1 to 5-grams, were used as seed terms for an embedding based nearest neighbor approach to fetch additional synonyms, hypernyms and hyponyms in the respective n-gram embedding spaces by leveraging two different language models. This was done to analyze the lexicality of the resulting term candidates and to compare the term classifications of both models. We found no difference in system performance during the processing of lexical and non-lexical content, i.e. abbreviations, acronyms, etc. Additionally, an application-oriented analysis of the SapBERT (Self-Alignment Pretraining for Biomedical Entity Representations) language model indicates suitable performance for the extraction of all term classifications such as synonyms, hypernyms, and hyponyms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Embedding-based terminology expansion via secondary use of large clinical real-world datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Role of Large Clinical Datasets From Physiologic Monitors in Improving the Safety of Clinical Alarm Systems and Methodological Considerations: A Case From Philips Monitors.
Azizeh Khaled Sowan ... Nancy Staggers
JMIR human factors | VOL. 3
Azizeh Khaled Sowan, et. al.Azizeh Khaled Sowan ... Nancy Staggers
30 Sep 2016
JMIR human factors | VOL. 3

Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.
David A Hanauer ... Kai Zheng
Journal of the American Medical Informatics Association | VOL. 21
David A Hanauer, et. al.David A Hanauer ... Kai Zheng
13 Jun 2014
Journal of the American Medical Informatics Association | VOL. 21

Are AI language models such as ChatGPT ready to improve the care of individuals with epilepsy?
Christian M Boßelmann ... Dennis Lal
Epilepsia | VOL. 64
Christian M Boßelmann, et. al.Christian M Boßelmann ... Dennis Lal
13 Mar 2023
Epilepsia | VOL. 64

Neural language models as psycholinguistic subjects: Representations of syntactic state
Richard Futrell ... Peng Qian
-
Richard Futrell, et. al.Richard Futrell ... Peng Qian
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Embedding-based terminology expansion via secondary use of large clinical real-world datasets

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics