Abstract

Introduction: Unsupervised machine learning (UML) applied to high dimensional data has been used to discover cardiovascular disease subtypes; however, the reproducibility of subtypes identified by different algorithms has not been explored. We compared the ability of several promising UML and clustering algorithms to identify heart failure (HF) subtypes using high dimensional electronic health record (EHR) data. Methods: Using the Penn Medicine EHR, we identified all patients who had > 2 instances of ICD-10-CM HF diagnosis. We extracted 1272 EHR-based features (vital signs, demographics, echocardiographic measurements, laboratories, comorbidities) from time of HF diagnosis and limited the cohort based on data completeness (n=8569). We selected the following methods based on prior success in simulation studies and used them to identify HF subtypes: Similarity Network Fusion (SNF), Locally Linear Embedding (LLE), Modified LLE, Uniform Manifold Approximation and Projection (UMAP), and Principal Component Analysis (PCA) followed by several clustering algorithms including K-means, Density-based spatial clustering of applications with noise (DBSCAN), and Spectral Clustering. K groups 2-12 were evaluated. Clustering performance was assessed by silhouette score and visual separation. Results: Model visualizations are shown in the Figure. Highest silhouette score achieved for each model varied widely from 0.02-0.62; optimal cluster number ranged from 2-4 across models. Normalization and standardization of continuous data did not significantly alter silhouette scores or optimal cluster number. Conclusions: HF subtypes identified through UML applied to EHR data may vary substantially depending on the algorithms used. Benchmarking strategies to evaluate reproducibility of UML in the EHR are needed to ensure valid HF patient stratification and phenotypic refinement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.