Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis

Sang Min Nam,Jee In Kang,Thomas A Peterson,Kyoung Yul Seo,Hyun Wook Han

doi:10.2196/27344

Abstract

BackgroundIn epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large.ObjectiveOur study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis.MethodsAn XGBoost model was trained and tested to classify “current depression” and “no lifetime depression” for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network.ResultsThe XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes.ConclusionsXGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.

Highlights

Importance of DepressionDepression is a common debilitating psychiatric condition characterized by a low-spirited mood, loss of interest, and a range of emotional, cognitive, physical, and behavioral symptoms
The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86
Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes

Summary

Introduction

Depression is a common debilitating psychiatric condition characterized by a low-spirited mood, loss of interest, and a range of emotional, cognitive, physical, and behavioral symptoms. It has a high global disease burden and had been projected to become the second most common cause of disability-adjusted life years worldwide by 2020 [1]. Psychological, and sociocultural factors underlying the pathogenesis of depression, an integrated model with confounder adjustment may provide a better understanding and multifaceted individualized approach for depression. Survey-weighted logistic regression is used to identify depression-associated factors. A simple regression model for each candidate factor is built to adjust for age and sex. In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Medical Internet Research	Publication Date: Jun 24, 2021
Citations: 12	License type: cc-by

R Discovery Prime

R Discovery Prime

Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research

Lead the way for us

Similar Papers

Machine learning-based identification and related features of depression in patients with diabetes mellitus based on the Korea National Health and Nutrition Examination Survey: A cross-sectional study.
Ji-Yoon Lee ... Doyeon Won
PloS one | VOL. 18
Ji-Yoon Lee, et. al.Ji-Yoon Lee ... Doyeon Won
13 Jul 2023
PloS one | VOL. 18

Investigation of the Noise Sensitivity of Machine Learning Algorithms on Credit Card Fraud Detection
İlhan Aytutuldu ... Mürüvvet Aslı Aydin
-
İlhan Aytutuldu, et. al.İlhan Aytutuldu ... Mürüvvet Aslı Aydin
09 Jun 2021
09 Jun 2021

Machine Learning-Based Network Status Detection and Fault Localization
Ayse Rumeysa Mohammed ... Shervin Shirmohammadi
IEEE Transactions on Instrumentation and Measurement | VOL. 70
Ayse Rumeysa Mohammed, et. al.Ayse Rumeysa Mohammed ... Shervin Shirmohammadi
01 Jan 2020
IEEE Transactions on Instrumentation and Measurement | VOL. 70

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease
Moojung Kim ... Young Jae Kim
BMC Cardiovascular Disorders | VOL. 21
Moojung Kim, et. al.Moojung Kim ... Young Jae Kim
09 Mar 2021
BMC Cardiovascular Disorders | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Medical Internet Research