Abstract Background Research on the overall relationship between disease is needed. The disease network model can comprehensively show the complex relationships between diseases and progression patterns. Existing disease network studies were estimated by univariate relative risks. We aim to develop the unbiased disease network. Methods We used NHIS national data of 50 million patients, collected from 2008 to 2022. We used only the primary diagnosis codes (three-digit codes) and defined 603 disease categories based on the special tabulation list for morbidity in the detailed subcategories of ICD-10, prior research, and the frequency of diseases in the dataset. To develop a more unbiased disease network, we first adjusted for age, sex, period, season, and encounter type, then added the confounding disease exposure according to the Directed Acyclic Graph (DAG) and performed Poisson regression to estimate relative risk (RR) applying Inverse Probability Weighting (IPW). We identified statistically significant disease pairs (RR > 1.10, p < 0.001) and repeated the IPW analysis up to the third round because the network would be updated every round. The significant disease pairs identified in the last round were connected to develop the overall network and subnetworks by age and sex. Results We repeated the process up to the third round for all 346,948 disease pairs and fitted 9,024 pairs as significant. We identified general population disease network and subnetwork per age and sex subgroups. Conclusions These disease networks represented the overall progression relationship between diagnosis codes. Key messages • We developed the unbiased disease network that shows progression patterns by adjusting for confounders and using IPW. • The disease network can be used to extract features from various diagnosis codes in EHR/Claims data, analyze causal effects, and predict disease or healthcare usage using RR or graph embedding.
Read full abstract