Abstract

High-resolution mass spectrometry (HRMS) data has revolutionized the identification of environmental contaminants through non-targeted analysis (NTA). However, chemical identification remains challenging due to the vast number of unknown molecular features typically observed in environmental samples. Advanced data processing techniques are required to improve chemical identification workflows. The ideal workflow brings together a variety of data and tools to increase the certainty of identification. One such tool is chromatographic retention time (RT) prediction, which can be used to reduce the number of possible suspect chemicals within an observed RT window. This paper compares the relative predictive ability and applicability to NTA workflows of three RT prediction models: (1) a logP (octanol-water partition coefficient)-based model using EPI Suite™ logP predictions; (2) a commercially available ACD/ChromGenius model; and, (3) a newly developed Quantitative Structure Retention Relationship model called OPERA-RT. Models were developed using the same training set of 78 compounds with experimental RT data and evaluated for external predictivity on an identical test set of 19 compounds. Both the ACD/ChromGenius and OPERA-RT models outperformed the EPI Suite™ logP-based RT model (R2 = 0.81–0.92, 0.86-0.83, 0.66–0.69 for training-test sets, respectively). Further, both OPERA-RT and ACD/ChromGenius predicted 95% of RTs within a ± 15% chromatographic time window of experimental RTs. Based on these results, we simulated an NTA workflow with a ten-fold larger list of candidate structures generated for formulae of the known test set chemicals using the U.S. EPA's CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard), RTs for all candidates were predicted using both ACD/ChromGenius and OPERA-RT, and RT screening windows were assessed for their ability to filter out unlikely candidate chemicals and enhance potential identification. Compared to ACD/ChromGenius, OPERA-RT screened out a greater percentage of candidate structures within a 3-min RT window (60% vs. 40%) but retained fewer of the known chemicals (42% vs. 83%). By several metrics, the OPERA-RT model, generated as a proof-of-concept using a limited set of open source data, performed as well as the commercial tool ACD/ChromGenius when constrained to the same small training and test sets. As the availability of RT data increases, we expect the OPERA-RT model's predictive ability will increase.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.