Nontargeted analysis (NTA) is increasingly utilized for its ability to identify key molecular features beyond known targets in complex samples. NTA is particularly advantageous in exploratory studies aimed at identifying phenotype-associated features or molecules able to classify various sample types. However, implementing NTA involves extensive data analyses and labor-intensive annotations. To address these limitations, we developed a rapid data screening capability compatible with NTA data collected on a liquid chromatography, ion mobility spectrometry, and mass spectrometry (LC-IMS-MS) platform that allows for sample classification while highlighting potential features of interest. Specifically, this method aggregates the thousands of IMS-MS spectra collected across the LC space for each sample and collapses the LC dimension, resulting in a single summed IMS-MS spectrum for screening. The summed IMS-MS spectra are then analyzed with a bootstrapped Lasso technique to identify key regions or coordinates for phenotype classification via support vector machines. Molecular annotations are then performed by examining the features present in the selected coordinates, highlighting potential molecular candidates. To demonstrate this summed IMS-MS screening approach, we applied it to clinical plasma lipidomic NTA data and exposomic NTA data from water sites with varying contaminant levels. Distinguishing coordinates were observed in both studies, enabling the evaluation of phenotypic molecular annotations and resulting in screening models capable of classifying samples with up to a 25% increase in accuracy compared to models using annotated data.
Read full abstract