Abstract

AbstractThe t‐distributed stochastic neighbour embedding algorithm or t‐SNE is a non‐linear dimension reduction method used to visualise multivariate data. It enables a high‐dimensional dataset, such as a set of infrared spectra, to be represented on a single, typically two‐dimensional graph, revealing its global and local structure. t‐SNE is very popular in the machine learning community and has been applied in many fields, generally with the aim of visualising large datasets. In vibrational spectroscopy, t‐SNE is gaining notoriety but principal component analysis (PCA) remains by far the reference method for exploratory analysis and dimension reduction. However, t‐SNE may represent a real aid in the analysis of vibrational spectroscopic datasets. It provides an at‐a‐glance global view of the dataset allowing to distinguish the main factors influencing the spectral signal and the hierarchy between these factors, and gives an indication on the possibility of performing predictive modelling. It can also provide great support in the choice of the pre‐processing, by comparing rapidly different general pre‐processing approaches according to their effect on the variable of interest. Here we propose to illustrate these advantages using different datasets. We also propose an approach based on a synergy between the t‐SNE and PCA methods, allowing respective advantages of each to be exploited.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call