Background: Atrial Fibrillation (AF) occurs in about one-fourth of patients with Embolic Stroke of Undetermined Source (ESUS). Accurate prediction of post-stroke AF upon discharge from an index stroke admission informs a personalized post-stroke monitoring strategy of AF and interventions. While clinical risk scores predict AF, machine learning (ML) models have shown superior performance. However, traditional ML approaches only use expert-derived predictors available in an electronic health record (EHR) and thus may miss variables that would potentially increase the accuracy of prediction. Aims: This study aims to enhance AF prediction by augmenting expert-derived predictors with an unbiased selection of full diagnostic codes and medication histories up to index strokes. Through embedding learning with hypergraph neural networks, we generate compact representations of high-dimensional data to improve prediction accuracy by capturing complex feature interactions. Methods: We analyzed data from 510 ESUS patients (55.3% female, mean age 61.4 years) from 2015 to 2023 at Emory Healthcare. We focus on experiments using a logistic regression (LR) model to predict AF from different sets of features. At baseline, we use 58 clinically motivated predictors, including comorbidities characterized by 17 ICD codes manually extracted based on literature, and 41 other features extracted from lab results, echocardiographic and ECG. To directly model the full history of comorbidities and medications, another baseline uses the full 1530 ICD codes plus the 41 other features (1571 in total). In contrast, the embedding method uses the full 1530 ICD codes to generate condensed, informative embedding vectors (32-dimensional), eventually getting 32+41=73 features. To generate the embedding, a hypergraph neural network was trained on a larger stroke cohort (n=7956) to model the interactions between the 1530 ICD codes. A nested cross-validation approach was employed within 5-fold splits, and ROC-AUC scores were recorded. Result: Among 510 ESUS patients, 107 (21.0%) developed AF (mean age 67.9 years, 57% female). We compared the performance of LR model with different features from ICD codes (Table 1). The results show that the learned 32-dim embedding vectors improves the prediction of post-ESUS AF. Conclusion: The embedding technique can significantly enhance predictive performance by integrating comprehensive medical information, maximizing the use of available data for improved outcomes.
Read full abstract