Abstract
Abstract Background The UAE Healthy Future Study (UAEHFS) is one of the first large prospective cohort studies in the Gulf region which examines causes and risk factors for chronic diseases among adult UAE nationals. Missing values are often unavoidable in empirical research and can in many cases, lead to bias. The aim of this study is to estimate the percentage of depression in the UAEHFS pilot data using the eight-item Patient Health Questionnaire (PHQ-8) variables, using different statistical methods. Methods Five common statistical machine learning methods of handling missing values were included in this analysis. These are mode imputation, k-nearest neighbor (KNN) imputation, classification, and regression trees (CART), random forest (RF) imputations, and random sample from observed values (Sample). 100 multiple imputations were used. Results 487 (94.2 %) eligible participants were included in the analysis. 231 (44.7%) were included in the complete case analysis. The median age was 30 years (Interquartile-Range: 23 - 38). More males (67.8%) than females included in the analysis. The estimated percentage of depression was 8.4%, 8.9%, 9.9%, 12.5%, 15.4% and 17.9% by the mode, complete case, sample, RF, CART, and KNN respectively. In additional analyses, the estimated proportions of depression were 11.5% by the Complete Case, 11.9% by KNN, 13.2% by K-means clustering, and 13.2% by Random Forest. Conclusions The estimated percentage of depression in the UAEHFS pilot data varies between the applied methods of handling missing values. This shows that the problem of missing values in the variables is not negligible. Further research is needed using multiple imputations in the main UAEHFS dataset after completing recruitment. Key messages • For the depression missing values, we recommend using multiple imputations not to generate data but to prevent the exclusion of observed data. • To have a better estimate of the percentage of depression, is recommended to use different machine learning methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.