Abstract

Missing data are a common problem in most research fields and introduce an element of ambiguity into data analysis. They can arise due to different reasons: mishandling of samples, measurement error, deleted aberrant value or simply lack of analysis. The nutrition domain is no exception to the problem of missing data. This paper addresses the problem of missing data in food composition databases (FCDBs). Missing data in FCDBs results in incomplete FCDBs, which have limited usage, because any dietary assessment can be performed only on a complete dataset. Most often, this problem is resolved by calculating means/medians from excising data in the same database or borrowing data from other FCDBs. These solutions introduce significant error. We focus on missing data imputation techniques based on methods for substituting missing values with statistical prediction: Non-Negative Matrix Factorization (NMF), Multiple Imputations by Chained Equations (MICE), Nonparametric Missing Value Imputation using Random Forest (MissForest), and K-Nearest Neighbors (KNN), and compared them with commonly used approaches - fill-in with mean, fill-in with median. The data used was from national FCDBs collected by EuroFIR (European Food Information Resource Network). The results show that the state-of-the-art methods for imputation yield better results than the traditional approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.