Abstract

Exploratory data analysis (EDA) through the projection of multivariate data into spaces of low dimensionality using methods such as principal components analysis (PCA) are at the core of chemometric applications in many fields, including metabolomics, biomarker discovery, food authentication, and many others. In addition to revealing underlying class structures in the data, unsupervised EDA methods have become a de facto method of confirmatory analysis for object classification (e.g., by health status, provenance), especially for small sample sizes, because they are not plagued with problems of overfitting that often accompany supervised methods. However, the characteristics of the scores plots for EDA projection methods are often highly dependent on data analysis options chosen, such as the type of preprocessing used, the projection method employed, the variables selected and (in the case of multiblock data) how the data are combined. The combinations of these parameters can lead to hundreds of different scores plots that need to be manually assessed for results that are interesting to the researcher. The present work is intended to expedite this process through a relational analysis of multiple results using Procrustes analysis to compare projections and applying hierarchical clustering to summarize the results in the form of a dendrogram. The software developed, ScorXplor, allows projections to be quickly assessed for their similarity and quality, with interactive plotting of scores plots for visual evaluation. Moreover, the approach provides a better understanding of the role and relationships among the various analysis options (preprocessing, analysis tools, etc.). The method is demonstrated using multiblock spectral data (UV–visible, near-infrared, mid-infrared) for flavored olive oils from different regions of Italy, implementing different preprocessing and fusion options, and applying PCA and maximum likelihood PCA as projection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call