Abstract

The study of diagnostic associations entails a large number of methodological problems regarding the application of machine learning algorithms, collinearity and wide variability being some of the most prominent ones. To overcome these, we propose and tested the usage of uniform manifold approximation and projection (UMAP), a very recent, popular dimensionality reduction technique. We showed its effectiveness by using it on a large Spanish clinical database of patients diagnosed with depression, to whom we applied UMAP before grouping them using a hierarchical agglomerative cluster analysis. By extensively studying its behavior and results, validating them with purely unsupervised metrics, we show that they are consistent with well-known relationships, which validates the applicability of UMAP to advance the study of comorbidities.

Highlights

  • Healthcare data are well known for their high complexity

  • The first step to explore the distribution of the average silhouette coefficient was to study how it behaved depending on the number of dimensions projected by uniform manifold approximation and projection (UMAP)

  • We aimed to address this problem through a procedure used in other fields—the application of a dimensionality reduction technique prior to a cluster analysis [5,6,7]—by applying a novel technique, UMAP [18], to a data set of Spanish adults diagnosed with depression

Read more

Summary

Introduction

Healthcare data are well known for their high complexity. Working with them pertains issues almost in every field where they are needed. Among the problems listed, which include the lack of unified databases and the combination of different data sources without almost any standardized implementation, there are certain issues that make it difficult to deal with them when working with machine learning algorithms [1]. This is a term widely used in literature and usually refers to analyses characterized by being able to learn to solve specific problems. In our case, when we refer to machine learning algorithms, we are describing a specific type of tools used for data processing and its application to the healthcare field. When working with EHR, we must face problems related to uneven data quality, the presence of both structured and unstructured data and extreme variability problems [2]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.