Abstract

A large number of clinical concepts are categorized under standardized formats that ease the manipulation, understanding, analysis, and exchange of information. One of the most extended codifications is the International Classification of Diseases (ICD) used for characterizing diagnoses and clinical procedures. With formatted ICD concepts, a patient profile can be described through a set of standardized and sorted attributes according to the relevance or chronology of events. This structured data is fundamental to quantify the similarity between patients and detect relevant clinical characteristics. Data visualization tools allow the representation and comprehension of data patterns, usually of a high dimensional nature, where only a partial picture can be projected. In this paper, we provide a visual analytics approach for the identification of homogeneous patient cohorts by combining custom distance metrics with a flexible dimensionality reduction technique. First we define a new metric to measure the similarity between diagnosis profiles through the concordance and relevance of events. Second we describe a variation of the Simplified Topological ion of Data (STAD) dimensionality reduction technique to enhance the projection of signals preserving the global structure of data. The MIMIC-III clinical database is used for implementing the analysis into an interactive dashboard, providing a highly expressive environment for the exploration and comparison of patients groups with at least one identical diagnostic ICD code. The combination of the distance metric and STAD not only allows the identification of patterns but also provides a new layer of information to establish additional relationships between patient cohorts. The method and tool presented here add a valuable new approach for exploring heterogeneous patient populations. In addition, the distance metric described can be applied in other domains that employ ordered lists of categorical data.

Highlights

  • Patient profiling and selection are a crucial step in the setup of clinical trials

  • The structure is still preserved in both networks (Figs. 3D and 3E). We applied this approach to the MIMIC-III database (Johnson et al, 2016), which is a publicly available dataset developed by the MIT Lab for Computational Physiology, containing anonymized health data from intensive care unit admissions between 2008 and 2014

  • To reduce the number of distinct terms in the list of diagnoses, International Classification of Diseases (ICD) codes were first grouped as described in the ICD guidelines Healthcare Cost & Utilization Project (2019)

Read more

Summary

Introduction

Patient profiling and selection are a crucial step in the setup of clinical trials. The process involves analytical methods to handle the increasing amount of healthcare data but is stillHow to cite this article Alcaide D, Aerts J. 2021. Patient similarity and distance measures for categorical events Different distance metrics exist for unordered lists of categorical data, including the overlap coefficient (Vijaymeena & Kavitha, 2016), the Jaccard index (Real & Vargas, 1996), and the simple matching coefficient (Šulc & Řezanková, 2014). These methods compute the number of matched attributes between two lists using different criteria. Correlation between ordered lists cannot be calculated when the lists are of different lengths (Pereira, Waxman & EyreWalker, 2009)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.