Background/Objectives: Clinical trials frequently employ diverse terminologies and definitions to describe similar outcomes, leading to ambiguity and inconsistency in data interpretation. Addressing the variability in clinical outcome reports and integrating semantically similar outcomes is important in healthcare and clinical research. Variability in outcome reporting not only hinders the comparability of clinical trial results but also poses significant challenges in evidence synthesis, meta-analysis, and evidence-based decision-making. Methods: This study investigates variability reduction in outcome measures reporting using rule-based and large language-based models. It aims to mitigate the challenges associated with variability in outcome reporting by comparing these two models. The first approach, which is rule-based, will leverage well-known ontologies, and the second approach exploits sentence-bidirectional encoder representations from transformers (SBERT) to identify semantically similar outcomes along with Generative Pre-training Transformer (GPT) to refine the results. Results: The results show that the relatively low percentages of outcomes are linked to established rule-based ontologies. Analysis of outcomes by word count highlighted the absence of ontological linkage for three-word outcomes, which indicates potential gaps in semantic representation. Conclusions: Employing large language models (LLMs), this study demonstrates its ability to identify similar outcomes, even with more than three words, suggesting a crucial role in outcome harmonization efforts, potentially reducing redundancy and enhancing data interoperability.
Read full abstract