Mass spectrometry imaging (MSI) in sedimentary archives can produce records of molecular proxies at μm-scale resolution. For example, in annually varved sediments of the Santa Barbara Basin, such a fine resolution allows deciphering sub-annual distributions of archaeal tetraether lipids, haptophyte-derived alkenones, and sterols. Herein, we reported the establishment of an untargeted data processing workflow aimed at dissecting the MSI datasets and extracting information beyond that obtained by targeted analysis of known molecular proxies. The combination of MSI and the untargeted workflow not only increases the spatial resolution for molecular stratigraphy but also dramatically broadens the number and diversity of molecular signals evaluated, enabling us to discover unique molecular signatures imprinted by various biogeochemical processes. We applied the proposed workflow to two MSI datasets that were both measured on the uppermost ∼10 cm of the Santa Barbara Basin sediments while covering different mass ranges. Two matrices of 18625×293 and 18963×323 (number of spectra × number of peaks) were, respectively, extracted after peak alignment using bin-wise kernel density estimation and subsequent peak picking by peak prominence filtering combined with geochemical context-based filtering. Feature extraction by non-negative matrix factorization revealed in total 15 stable molecular clusters with distinct spatial distributions in the sediments. Each cluster typically comprised several to dozens of compounds, with the majority of compounds in each cluster likely belonging to similar chemical taxonomies. Some of these clusters can be linked to specific biogeochemical processes. For example, chlorin-like compounds are possibly related to diatom production, alkenones are related to coccolithophorid production, and steranes and long-chain fatty acids likely represent terrigenous input. Supervised learning from these data mining results further extracted molecular signatures with proxy potential that appear to be linked to specific environmental conditions inferred from historical oceanographic data. However, generalizability to other sedimentary settings will require further investigation.
Read full abstract