Abstract

BackgroundSemantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area.MethodsWe develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III).Results300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures.ConclusionWe identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.

Highlights

  • Semantic similarity is a valuable tool for analysis in biomedicine

  • To calculate the semantic similarity scores, we used the Semantic Measures Library toolkit (SML) [32], and we explored every available combination of information content, pairwise, and groupwise similarity measure available within the library

  • To what extent, a lack of matching diagnoses for any admissions in our sample negatively affected scores, we evaluated a modified Mean Reciprocal Rank (MRR) metric that removed values of MRR that were 0, since values of 0 indicate there were no admissions with a matching primary diagnosis

Read more

Summary

Introduction

Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. These representations facilitate secondary analyses that make use of background knowledge encoded in or linked by ontologies One such method for semantic analysis is Slater et al BMC Medical Informatics and Decision Making (2022) 22:33 semantic similarity: a class of methods that leverage the structural features of ontologies to calculate numerical measures of similarity between classes or sets of classes [4, 5]. These methods have been widely explored, amongst others, for prediction of protein–protein interaction [6], disease gene prioritisation [7, 8], and rare disease diagnosis [9]. Another work explored the use of uncurated text-derived phenotypes for differential diagnosis of common diseases [12]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call