Abstract

Multivariate exploratory data analysis allows revealing patterns and extracting information from complex multivariate data sets. However, highly complex data may not show evident groupings or trends in the principal component space, e.g. because the variation of the variables are not grouped but rather continuous. In these cases, classical exploratory methods may not provide satisfactory results when the aim is to find distinct groupings in the data.To enhance information extraction in such situations, we propose a novel approach inspired by the concept of combining weak classifiers, but in the unsupervised context. The approach is based on the fusion of several adjacency matrices obtained by different distance measures on data from different analytical platforms. This paper is intended to present and discuss the potential of the approach through a benchmark data set of beer samples. The beer data were acquired using three spectroscopic techniques: Visible, near-Infrared and Nuclear Magnetic Resonance.The results of fusing the three data sets via the proposed approach are compared with those from the single data blocks (Visible, NIR and NMR) and from a standard mid-level data fusion methodology. It is shown that, with the suggested approach, groupings related to beer style and other features are efficiently recovered, and generally more evident.

Highlights

  • Exploratory multivariate data analysis (EMDA, [1]) offers very powerful tools for looking into complex data

  • The first one, the Ales group, is mainly composed by 377 ale-style samples and it is less homogeneous compared to the second, the Lagers group, which is largely composed by lager-style samples

  • Some of the mid-coloured samples are spread along PC2, and the four 384 samples with the strongest absorption have negative scores on this component

Read more

Summary

Introduction

Exploratory multivariate data analysis (EMDA, [1]) offers very powerful tools for looking into complex data. Using EMDA it is possible, for example, to reveal underlying structures and discover groups of similar samples and visualizing such patterns in an accessible and simple way. Organizing Maps (SOMs, [10,11]) are considered complementary to methods like PCA [12], because of their ability to account for non-linear phenomena. All these techniques are called “projection” methods, since they are based on projecting the original high-dimensional data to a space of lower dimensions, which makes it easier to model, plot and visualize the data. Dissimilarity (or similarity) is at the core of clustering, and it is often assessed using a distance measure, based on which linkage/grouping criteria are defined

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call