Abstract

This paper introduces a new method, Adaptive Clustering around Latent Variables (AdaCLV), for simultaneous dimensionality reduction and variable clustering, the partitioning of variables into groups. This unsupervised method is particularly well suited for the exploration of spectroscopic datasets, such as Nuclear Magnetic Resonance (NMR) spectra, and can be used for the identification of potential biomarkers. AdaCLV is inspired by existing multivariate methods from the Clustering around Latent Variables (CLV) family, but it offers several key advantages with respect to these methods. First, AdaCLV estimates cluster membership degrees that are more interpretable and representative of spectroscopic data, where peaks for different molecules (i.e. variable clusters) may overlap and variables within a cluster have different degrees of importance. Second, AdaCLV is less sensitive to its hyperparameters than other competing methods, adapting to the clustering structure present in the data. This paper compares AdaCLV with existing CLV methods and other competitors in experiments involving real and semi-artificial NMR spectra. AdaCLV is shown to be more robust to hyperparameter choice and to have better precision than the other methods, for all cluster numbers, sample sizes and levels of signal tested, while achieving a comparable level of recall.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call