Abstract

Although medical checkup data would be useful for identifying unknown factors of disease progression, a causal relationship between checkup items should be taken into account for precise analysis. Missing values in medical checkup data must be appropriately imputed because checkup items vary from person to person, and items that have not been tested include missing values. In addition, the patients with target diseases or disorders are small in comparison with the total number of persons recorded in the data, which means medical checkup data is an imbalanced data analysis. We propose a new method for analyzing the causal relationship in medical checkup data to discover disease progression factors based on a linear non-Gaussian acyclic model (LiNGAM), a machine learning technique for causal inference. In the proposed method, specific regression coefficients calculated through LiNGAM were compared to estimate the causal strength of the checkup items on disease progression, which is referred to as LiNGAM-beta. We also propose an analysis framework consisting of LiNGAM-beta, collaborative filtering (CF), and a sampling approach for causal inference of medical checkup data. CF and the sampling approach are useful for missing value imputation and balancing of the data distribution. We applied the proposed analysis framework to medical checkup data for identifying factors of Nonalcoholic fatty liver disease (NAFLD) development. The checkup items related to metabolic syndrome and age showed high causal effects on NAFLD severity. The level of blood urea nitrogen (BUN) would have a negative effect on NAFLD severity. Snoring frequency, which is associated with obstructive sleep apnea, affected NAFLD severity, particularly in the male group. Sleep duration also affected NAFLD severity in persons over fifty years old. These analysis results are consistent with previous reports about the causes of NAFLD; for example, NAFLD and metabolic syndrome are mutual and bi-directionally related, and BUN has a negative effect on NAFLD progression. Thus, our analysis result is plausible. The proposed analysis framework including LiNGAM-beta can be applied to various medical checkup data and will contribute to discovering unknown disease factors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.