Abstract

Identification of unknowns is a bottleneck for large-scale untargeted analyses like metabolomics or drug metabolite identification. Ion mobility-mass spectrometry (IM-MS) provides rapid two-dimensional separation of ions based on their mobility through a neutral buffer gas. The mobility of an ion is related to its collision cross section (CCS) with the buffer gas, a physical property that is determined by the size and shape of the ion. This structural dependency makes CCS a promising characteristic for compound identification, but this utility is limited by the availability of high-quality reference CCS values. CCS prediction using machine learning (ML) has recently shown promise in the field, but accurate and broadly applicable models are still lacking. Here we present a novel ML approach that employs a comprehensive collection of CCS values covering a wide range of chemical space. Using this diverse database, we identified the structural characteristics, represented by molecular quantum numbers (MQNs), that contribute to variance in CCS and assessed the performance of a variety of ML algorithms in predicting CCS. We found that by breaking down the chemical structural diversity using unsupervised clustering based on the MQNs, specific and accurate prediction models for each cluster can be trained, which showed superior performance than a single model trained with all data. Using this approach, we have robustly trained and characterized a CCS prediction model with high accuracy on diverse chemical structures. An all-in-one web interface (https://CCSbase.net) was built for querying the CCS database and accessing the predictive model to support unknown compound identifications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call