Abstract
Data analysis is a pivotal step in the metabolomics investigation pipeline as it allows to extract knowledge from datasets and get new insights and new perspectives as well as generate new hypotheses about biological complex processes. Data analysis uses various strategies including exploratory analysis, which evaluates metabolites individually and machine learning algorithms which in turn can be unsupervised (data driven) or supervised (task driven) allowing the study of the interplay of the several investigated metabolites. Among the first, particularly important in metabolomics are the clustering algorithms and the principal components analysis. Among the latter, a prominent role is played by partial least square discriminant analysis, artificial neural networks, and supported vectors machines. These algorithms as well as solo can be combined in ensemble models. Particular emphasis is placed in this chapter on various strategies for evaluating the effectiveness of machine learning algorithms trainings, including validation strategies, features selection mechanisms, and hyperparameter optimization.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have