Abstract

An important step in preparing data for statistical analysis is outlier detection and removal, yet no gold standard exists in current literature. The objective of this study is to identify the ideal decision test using the National Health and Nutrition Examination Survey (NHANES) 2017-2018 dietary data. We conducted a secondary analysis of NHANES 24-h dietary recalls, considering the survey's multi-stage cluster design. Six outlier detection and removal strategies were assessed by evaluating the decision tests' impact on the Pearson's correlation coefficient among macronutrients. Furthermore, we assessed changes in the effect size estimates based on pre-defined sample sizes. The data were collected as part of the 2017-2018 24-h dietary recall among adult participants (N=4,893). Effect estimate changes for macronutrients varied from 6.5 % for protein to 39.3 % for alcohol across all decision tests. The largest proportion of outliers removed was 4.0 % in the large sample size, for the decision test, >2 standard deviations from the mean. The smallest sample size, particularly for alcohol analysis, was most affected by the six decision tests when compared to no decision test. This study, the first to use 2017-2018 NHANES dietary data for outlier evaluation, emphasizes the importance of selecting an appropriate decision test considering factors such as statistical power, sample size, normality assumptions, the proportion of data removed, effect estimate changes, and the consistency of estimates across sample sizes. We recommend the use of non-parametric tests for non-normally distributed variables of interest.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call