Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities.

Matthias Templ,Barbara Templ

doi:10.3390/molecules26195752

Abstract

In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.

Highlights

The importance of food composition data to nutrition and public health has been long acknowledged [1]
An inspection of the literature on the analytical and statistical methods frequently used in food science [2,3,4] as well as in chemometrics of honey [7] do not mention compositional data analysis (CoDa) [8]
The aim of this research was to compare compositional data analysis with classical statistical analyses to demonstrate how data pre-processing can influence a multivariate analysis, how a proper analysis can improve interpretation, and how a compositional method improves the accuracy of classification

Summary

Introduction

The importance of food composition data to nutrition and public health has been long acknowledged [1]. An inspection of the literature on the analytical and statistical methods frequently used in food science [2,3,4] as well as in chemometrics of honey [7] do not mention compositional data analysis (CoDa) [8]. A composition is the quantified decomposition of a whole into its component parts. A composition was described as random vectors with strictly positive components that added up to a whole, e.g., 100. It stands for all vectors that represent parts of a whole and carry relative information. CoDa, including the log-ratio methodology described later, is a method for describing the parts/connections of a whole that conveying relative information.

Objectives

Methods

Results

Discussion

Conclusion