AbstractRaman spectroscopy is a popular tool for characterizing complex biological materials and their geological remains. Ordination methods, such as principal component analysis (PCA), use spectral variance to create a compositional space, the ChemoSpace, grouping samples based on spectroscopic manifestations reflecting different biological properties or geological processes. PCA allows to reduce the dimensionality of complex spectroscopic data and facilitates the extraction of informative features into formats suitable for downstream statistical analyses, thus representing a first step in the development of diagnostic biosignatures from complex modern and fossil tissues. For such samples, however, there is presently no systematic and accessible survey of the impact of sample, instrument, and spectral processing on the occupation of the ChemoSpace. Here, the influence of sample count, unwanted signals and different signal‐to‐noise ratios, spectrometer decalibration, baseline subtraction, and spectral normalization on ChemoSpace grouping is investigated and exemplified using synthetic spectra. Increase in sample size improves the dissociation of groups in the ChemoSpace, and our sample yields a representative and mostly stable pattern in occupation with less than 10 samples per group. The impact of systemic interference of different amplitude and frequency, periodical or random features that can be introduced by instrument or sample, on compositional biological signatures is reduced by PCA and allows to extract biological information even when spectra of differing signal‐to‐noise ratios are compared. Routine offsets ( 1 cm−1) in spectrometer calibration contribute in our sample to less than 0.1% of the total spectral variance captured in the ChemoSpace and do not obscure biological information. Standard adaptive baselining, together with normalization, increases spectral comparability and facilitates the extraction of informative features. The ChemoSpace approach to biosignatures represents a powerful tool for exploring, denoising, and integrating molecular information from modern and ancient organismal samples.
Read full abstract