Abstract

Certain diseases have strong comorbidity and co-occurrence with others. Understanding disease–disease associations can potentially increase awareness among healthcare providers of co-occurring conditions and facilitate earlier diagnosis, prevention and treatment of patients. In this study, we utilized the valuable and large The Guideline Advantage (TGA) longitudinal electronic health record dataset from 70 outpatient clinics across the United States to investigate potential disease–disease associations. Specifically, the most prevalent 50 disease diagnoses were manually identified from 165,732 unique patients. To investigate the co-occurrence or dependency associations among the 50 diseases, the categorical disease terms were first mapped into numerical vectors based on disease co-occurrence frequency in individual patients using the Word2Vec approach. Then the novel and interesting disease association clusters were identified using correlation and clustering analyses in the numerical space. Moreover, the distribution of time delay (Δt) between pair-wise strongly associated diseases (correlation coefficients ≥ 0.5) were calculated to show the dependency among the diseases. The results can indicate the risk of disease comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making.

Highlights

  • Many diseases have strong associations with others and often co-occur within patients

  • Different from in-hospital patients’ electronic health records (EHR) data, the The Guideline Advantage (TGA) data provides a unique resource to understand the potential long-term, like hundreds of days, disease–disease associations, which are important for disease diagnosis, prevention, and treatment decision making

  • In order to study the associations between different diseases within patients, we selected from this subset of patients those with more than five unique Classifications Software (CCS) category codes (n = 165,732)

Read more

Summary

Introduction

Many diseases have strong associations with others and often co-occur within patients. Large-scale EHR datasets (including 35 million patients) were analyzed and combined with the genome-wide association study (GWAS) data (indicating the disease–gene associations) to uncover novel disease–disease and disease–gene a­ ssociations[7]. Different from in-hospital patients’ EHR data, the TGA data provides a unique resource to understand the potential long-term, like hundreds of days, disease–disease associations, which are important for disease diagnosis, prevention, and treatment decision making. In this exploratory study, we identified the most prevalent 50 disease diagnoses from 165,732 unique patients. We investigated the distribution of time delay (Δt) of cooccurrence or dependency between 32 strongly associated disease pairs, to understand the risk of comorbidity and complications, and facilitate disease prevention and optimal treatment decision-making

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.