A Novel Chemical-Space-DependentStrategy for CompoundSelection in Non-target LC-HRMS Method Development Using Physicochemicaland Structural Data
The virtual chemical space of substances, including emergingcontaminantsrelevant to the environment and exposome, is rapidly expanding. Non-targetedanalysis (NTA) by liquid chromatography–high-resolution massspectrometry (LC-HRMS) is useful in measuring broad chemical spaceregions. Internal standards are typically used to optimize the selectivityand sensitivity of NTA LC-HRMS methods, assuming a linear relationshipbetween structure and behavior across all analytes. However, thisassumption fails for large, heterogeneous chemical spaces, narrowingmeasurable coverage to structurally similar compounds. We presenta data-driven strategy for unbiased sampling of candidate structuresfor NTA LC-HRMS method development from extensive chemical spaces,such as the U.S. EPA’s CompTox (>1 million chemicals). Theworkflow maximizes physicochemical/structural diversity using precomputedPubChem descriptors (e.g., molecular weight, XLogP) and grants LC-HRMScompatibility thanks to predicted mobility and ionization efficiencyfrom molecular fingerprints. The resulting measurable compound lists(MCLs) provide broad, heterogeneous coverage for NTA method development,validation, and boundary assessment. Applied to the CompTox space,the approach yielded MCLs with greater chemical coverage and broaderpredicted LC-HRMS applicability than conventional “watch list”contaminants, offering a robust framework for enhancing NTA’smeasurable chemical space while preserving diversity.
15
- 10.1038/s41592-023-02143-z
- Jan 8, 2024
- Nature Methods
- 10.1021/acs.analchem.5c00816
- Jun 13, 2025
- Analytical Chemistry
- 10.26434/chemrxiv-2025-xl6xl
- Jun 20, 2025
115
- 10.1186/s12302-023-00779-4
- Sep 4, 2023
- Environmental Sciences Europe
3
- 10.1016/j.aca.2024.342869
- Jun 20, 2024
- Analytica Chimica Acta
33
- 10.1021/acs.est.3c03606
- Sep 13, 2023
- Environmental Science & Technology
548
- 10.1021/ar500432k
- Feb 17, 2015
- Accounts of Chemical Research
22
- 10.1007/s00216-020-02716-3
- Jun 3, 2020
- Analytical and Bioanalytical Chemistry
93
- 10.1038/s41370-023-00574-6
- Jun 28, 2023
- Journal of Exposure Science & Environmental Epidemiology
26
- 10.1021/acs.est.4c01156
- Jul 10, 2024
- Environmental science & technology
- Research Article
1
- 10.1007/s00216-025-05919-8
- Jun 9, 2025
- Analytical and bioanalytical chemistry
Quantitative non-targeted analysis (qNTA) is an important tool for characterizing emerging contaminants in environmental, biological, and product-based samples. While traditional non-targeted analysis (NTA) focuses on chemical identification, qNTA additionally produces chemical concentration estimates. These estimates can inform provisional risk-based decisions and prioritize targets for follow-up analysis. Many common qNTA and "semi-quantitative" approaches rely on surrogate chemicals for calibration and model predictions. Despite their importance, surrogates are often chosen based on a combination of intuition and/or availability rather than rational (i.e., structure-based) selection. The lack of rational selection limits the degree to which qNTA can be objectively, mathematically assessed and improved. In this work, we systematically assess the extent to which chemical structure should inform the selection of qNTA surrogates using a dataset from liquid chromatography high-resolution mass spectrometry (LC-HRMS) experiments. First, we calculate a chemical space embedding using available LC-HRMS training data (n=385 chemicals) and 2D molecular descriptors deemed important to electrospray ionization efficiency. Then, using data from EPA's Non-Targeted Analysis Collaborative Trial (ENTACT), we calculate the leverage of measured analytes (n=533 chemicals) within the embedded chemical space. Based on leverage calculations, we implement multiple structure-based surrogate selection strategies and compare those to random selection using qNTA metrics for accuracy, uncertainty, and reliability. Finally, we propose and examine the "leveraged averaged representative distance" (LARD) as a means to quantify the coverage of qNTA surrogates within a defined chemical space. Our results show that qNTA models can benefit from rational surrogate selection strategies. They further show that a large enough random surrogate sample can perform as well as a smaller, chemically informed surrogate sample. Researchers are advised to carefully consider these findings when selecting surrogates for future qNTA studies.
- Research Article
43
- 10.1007/s00216-022-04434-4
- Nov 26, 2022
- Analytical and Bioanalytical Chemistry
Non-targeted analysis (NTA) using high-resolution mass spectrometry allows scientists to detect and identify a broad range of compounds in diverse matrices for monitoring exposure and toxicological evaluation without a priori chemical knowledge. NTA methods present an opportunity to describe the constituents of a sample across a multidimensional swath of chemical properties, referred to as "chemical space." Understanding and communicating which region of chemical space is extractable and detectable by an NTA workflow, however, remains challenging and non-standardized. For example, many sample processing and data analysis steps influence the types of chemicals that can be detected and identified. Accordingly, it is challenging to assess whether analyte non-detection in an NTA study indicates true absence in a sample (above a detection limit) or is a false negative driven by workflow limitations. Here, we describe the need for accessible approaches that enable chemical space mapping in NTA studies, propose a tool to address this need, and highlight the different ways in which it could be implemented in NTA workflows. We identify a suite of existing predictive and analytical tools that can be used in combination to generate scores that describe the likelihood a compound will be detected and identified by a given NTA workflow based on the predicted chemical space of that workflow. Higher scores correspond to a higher likelihood of compound detection and identification in a given workflow (based on sample extraction, data acquisition, and data analysis parameters). Lower scores indicate a lower probability of detection, even if the compound is truly present in the samples of interest. Understanding the constraints of NTA workflows can be useful for stakeholders when results from NTA studies are used in real-world applications and for NTA researchers working to improve their workflow performance. The hypothetical ChemSpaceTool suggested herein could be used in both a prospective and retrospective sense. Prospectively, the tool can be used to further curate screening libraries and set identification thresholds. Retrospectively, false detections can be filtered by the plausibility of the compound identification by the selected NTA method, increasing the confidence of unknown identifications. Lastly, this work highlights the chemometric needs to make such a tool robust and usable across a wide range of NTA disciplines and invites others who are working on various models to participate in the development of the ChemSpaceTool. Ultimately, the development of a chemical space mapping tool strives to enable further standardization of NTA by improving method transparency and communication around false detection rates, thus allowing for more direct method comparisons between studies and improved reproducibility. This, in turn, is expected to promote further widespread applications of NTA beyond research-oriented settings.
- Research Article
57
- 10.1016/j.talanta.2020.121339
- Jul 7, 2020
- Talanta
Recent advances in non-targeted screening analysis using liquid chromatography - high resolution mass spectrometry to explore new biomarkers for human exposure
- Research Article
- 10.5194/acp-25-4367-2025
- Apr 22, 2025
- Atmospheric Chemistry and Physics
Abstract. Domestic biomass burning is a significant source of organic aerosol (OA) to the atmosphere; however, the understanding of OA composition under different burning conditions and after oxidation is largely unknown. Compositional analysis of OA is often limited by the lack of analytical standards available for quantification; however, semi-quantitative non-target analysis (NTA) can overcome these limitations by enabling the detection of thousands of compounds and quantification via surrogate standards. A series of controlled-burn experiments were conducted at the Manchester Aerosol Chamber to investigate domestic biomass-burning OA (BBOA) under different burning conditions and the impact of atmospheric ageing. Insights into the chemical composition of fresh and aged OA from flaming-dominated and smouldering-dominated combustion were obtained via a newly developed semi-quantitative NTA approach using ultra-high-performance liquid chromatography high-resolution mass spectrometry. Aerosol from smouldering-dominated burns contained significant organic carbon content, whereas under flaming-dominated conditions it was primarily black carbon. The detectable OA mass from both conditions was dominated by oxygenated compounds (CHO) (≈ 90 %) with smaller contributions from organonitrogen species. Primary OA (POA) had a high concentration of C8–C17CHO compounds, with both burns exhibiting a peak between C8–C11. However, flaming-dominated POA exhibited a greater contribution of C13–C17 CHO species. More than 50 % of the CHO mass in POA was determined as aromatic by the aromaticity index, largely in the form of functionalised monoaromatic compounds. After ageing, the aromatic contribution to the total CHO mass decreased with a greater loss for smouldering (−53 %) than flaming (−16 %) due to the increased reduction of polyaromatic compounds under smouldering conditions. The O:C ratios of the aged OA from flaming and smouldering were consistent with those from the oxidation of aromatic compounds (0.57–1.00), suggesting that compositional changes upon ageing were driven by the oxidation of aromatic compounds and the loss of aromaticity. However, there was a greater probability of O:C ratios ≥ 0.8 in aged smouldering OA, indicating the presence of more oxidised species. This study presents the first reported quantitative non-target compositional analysis of domestic BBOA using retention window scaling and demonstrates that compositional changes between burn phase and after ageing may have important consequences for exposure to such emissions in residential settings.
- Research Article
- 10.1016/j.aca.2025.344215
- Aug 1, 2025
- Analytica chimica acta
Development of a quantitative structure-response relationships to estimate concentrations of plasticizer metabolites in urine without reference standards using non-targeted analysis with liquid chromatography high-resolution mass spectrometry.
- Research Article
- 10.1016/j.envres.2024.120494
- Nov 30, 2024
- Environmental Research
Non-targeted analysis and suspect screening of organic contaminants in temperate snowfall using liquid chromatography high-resolution mass spectrometry
- Research Article
5
- 10.1007/s00216-025-05771-w
- Feb 15, 2025
- Analytical and bioanalytical chemistry
The US Environmental Protection Agency (EPA) uses non-targeted analysis (NTA) to characterize potential risks associated with environmental pollutants and anthropogenic materials. NTA is used throughout EPA's Office of Research and Development (ORD) to support the needs of states, tribes, EPA regions, EPA program offices, and other outside partners. NTA methods are complex and conducted via myriad instrumental platforms and software products. Comprehensive standards do not yet exist to guide NTA quality assurance/quality control (QA/QC) procedures. Furthermore, no single software tool meets EPA's needs for QA/QC review and documentation. Considering these factors, ORD developed "INTERPRET NTA" (Interface for Processing, Reviewing, and Translating NTA data) to support liquid chromatography (LC) high-resolution mass spectrometry (HRMS) NTA experiments. For purposes of NTA QA/QC, INTERPRET NTA (1) calculates data quality statistics related to accuracy, precision, and reproducibility; (2) produces interactive visualizations to facilitate quality threshold optimization; and (3) outputs comprehensive documentation for inclusion in official reports and research publications. INTERPRET NTA has additional functionality to facilitate rapid chemical identification and risk-based prioritization. The current article describes only the QA/QC elements of INTERPRET NTA's MS1 workflow, which are demonstrated using published data from a de facto water reuse study. INTERPRET NTA, in its current form, exists primarily to meet the needs of EPA and its partners, but a public release is planned. Workflows, terminology, and outputs of INTERPRET NTA provide a focal point for necessary discussions on the harmonization of NTA QA/QC practices.
- Research Article
1
- 10.1016/j.scitotenv.2024.176922
- Oct 18, 2024
- Science of the Total Environment
Investigating the chemical space coverage of multiple chromatographic and ionization methods using non-targeted analysis on surface and drinking water collected using passive sampling
- Research Article
50
- 10.1016/j.trac.2021.116188
- Jan 15, 2021
- TrAC Trends in Analytical Chemistry
Data processing strategies for non-targeted analysis of foods using liquid chromatography/high-resolution mass spectrometry
- Research Article
21
- 10.1007/s00216-015-9286-x
- Jan 12, 2016
- Analytical and Bioanalytical Chemistry
In the present study, the application of a liquid chromatography high-resolution mass spectrometry (LC-HRMS) analytical assay for the quantitative analysis of a recombinant human immunoglobulin G1 (hIgG1) in rat serum is reported using three generic peptides GPSVFPLAPSSK (GPS), TTPPVLDSDGSFFLYSK (TTP), and VVSVLTVLHQDWLNGK (VVS). Moreover, the deamidation site of a fourth peptide FNWYVDGVEVHNAK (FNW) was identified and further excluded from the assay evaluation due to the inaccuracy of the quantitative results. The rat serum samples were spiked with a fully labeled hIgG1 as internal standard (ISTD). The digestion with trypsin was performed onto the pellet prior to peptide analysis by LC-HRMS using a quadrupole time of flight (QTOF) mass analyzer operating in selected reaction monitoring (SRM) mode with enhanced duty cycles (EDC). The assay linearity for the three investigated peptides was established for a hIgG1 (hIgG1A) from 1.00 to 1000 μg mL(-1) with a mean coefficient of determination (R (2)) higher than 0.9868. The inter-day accuracy and precision obtained in rat serum over 3 days were ≤11.4 and ≤10.5%, respectively. Short-term stability on the auto-sampler at 6 °C for 30 h, at RT for 48 h, and a 100-fold dilution factor were demonstrated. In addition, QC samples prepared in cynomolgus monkey serum and measured with the present method met the acceptance criteria of ±20.0 and ≤20.0% for all three peptides regarding accuracy and precision, respectively. The LC-HRMS method was applied to the analysis of samples from five individual cynomolgus monkeys dosed with a second hIgG1 (hIgG1B) and consistent data were obtained compared to the LC-MS/MS method (conventional triple quadrupole (QqQ) mass analyzer operating in SRM). The present data demonstrate that LC-HRMS can be used for the quantitative analysis of hIgG1 in both species and that quantification is not only limited to classical QqQ instruments.
- Research Article
93
- 10.1038/s41370-023-00574-6
- Jun 28, 2023
- Journal of Exposure Science & Environmental Epidemiology
Non-targeted analysis (NTA) and suspect screening analysis (SSA) are powerful techniques that rely on high-resolution mass spectrometry (HRMS) and computational tools to detect and identify unknown or suspected chemicals in the exposome. Fully understanding the chemical exposome requires characterization of both environmental media and human specimens. As such, we conducted a review to examine the use of different NTA and SSA methods in various exposure media and human samples, including the results and chemicals detected. The literature review was conducted by searching literature databases, such as PubMed and Web of Science, for keywords, such as “non-targeted analysis”, “suspect screening analysis” and the exposure media. Sources of human exposure to environmental chemicals discussed in this review include water, air, soil/sediment, dust, and food and consumer products. The use of NTA for exposure discovery in human biospecimen is also reviewed. The chemical space that has been captured using NTA varies by media analyzed and analytical platform. In each media the chemicals that were frequently detected using NTA were: per- and polyfluoroalkyl substances (PFAS) and pharmaceuticals in water, pesticides and polyaromatic hydrocarbons (PAHs) in soil and sediment, volatile and semi-volatile organic compounds in air, flame retardants in dust, plasticizers in consumer products, and plasticizers, pesticides, and halogenated compounds in human samples. Some studies reviewed herein used both liquid chromatography (LC) and gas chromatography (GC) HRMS to increase the detected chemical space (16%); however, the majority (51%) only used LC-HRMS and fewer used GC-HRMS (32%). Finally, we identify knowledge and technology gaps that must be overcome to fully assess potential chemical exposures using NTA. Understanding the chemical space is essential to identifying and prioritizing gaps in our understanding of exposure sources and prior exposures.Impact statementThis review examines the results and chemicals detected by analyzing exposure media and human samples using high-resolution mass spectrometry based non-targeted analysis (NTA) and suspect screening analysis (SSA).
- Preprint Article
- 10.5194/egusphere-egu24-10676
- Jan 20, 2025
The increasing worldwide release of anthropogenic chemicals compounds into the aquatic ecosystems has led serious contamination of freshwater resources.  This study investigated the chemical composition of the water and sediments of L'Albufera Natural Park, Valencia, Spain, an area heavily impacted by intensive agriculture, surrounded by an industrial belt, highly urbanized and historically polluted. The goal was to assess the different water sources and anthropogenic influence in this managed area using nontarget analysis (NTA) combined with high-resolution mass spectrometry (HRMS). Surface water and sediment samples were collected from 51 sites during two sampling events in the May/June 2019 and September/October 2019. These two periods were selected because the most relevant crop in the area are rice fields and these two periods coincides with the starting of the cultivation and the harvest. The HRMS data was processed using Compound Discoverer™ version 3.3, and the results were analyzed using Principal Component Analysis (PCA). Agricultural practices are one of the most important sources of contaminants (mostly pesticides) including at concentrations >100 ng L-1 acetamiprid, azoxystrobin, chlorfenvinfos, chlorpyrifos, difenoconazole, dimethoate, fluvalinate, imazalil, imidacloprid, omethoate, propazine, tebuconazole, terbumeton deethyl, terbuthylazine, thiabendazole and tricyclazole. Increased presence and intensity of organic contaminants along the waterway was observed, indicating significant anthropogenic influence in the area. The NTA and post-processing were evaluated for reproducibility, demonstrating robustness with a 71.2% average reproducibility for compounds detected the 2 sampling trips. A detection frequency of 80% was the set criterion for detected compounds suggested as tracers. To prioritize samples, hierarchical cluster analysis was employed, and potential tracers for each water source were determined. Additionally, urban-influenced contaminants such as insect repellents, pharmaceuticals, and non-agricultural herbicides were identified along the channels that transports treated wastewater to the Natural Park. This study highlights the impact of human activities on L’Albufera Natural Park and demonstrates the effectiveness of NTA in differentiating and tracking water sources. The results emphasize the importance of reproducibility in NTA and provide guidance on implementing monitoring strategies by prioritizing samples based on chemical compositions.
- Research Article
15
- 10.1016/j.scitotenv.2020.136835
- Jan 21, 2020
- Science of The Total Environment
Suspect and non-target screening of acutely toxic Prymnesium parvum.
- Research Article
160
- 10.3109/15563650.2012.713108
- Aug 13, 2012
- Clinical Toxicology
Background. Gas chromatography (GC) and liquid chromatography (LC) coupled with mass spectrometry (MS) are widely used to confirm drug screening results and for urine screening in presumed intoxicated patients. These techniques are better suited to targeted analysis than to general unknown screening and, due to the complexity of testing, results are seldom available rapidly enough to contribute to the immediate care of the patient. High resolution (HR)/MS with time-of-flight (TOF) or orbitrap instruments offer potential advantages in clinical toxicology. Comparison of GC-MS, LC-MS/MS and LC-HR/MS. For unknown analyses, GC-MS and LC-MS/MS require comparison of full-scan spectra against preestablished libraries. Operation in full-scan mode greatly reduces sensitivity and some drugs present in low but significant concentrations may be missed. Selected ion monitoring (SIM) in GC/MS and selected reaction monitoring (SRM) in LC-MS/MS, where only targeted ions are monitored, increase sensitivity but require prior knowledge of what compound is to be measured. LC-HR/MS offers mass assignment with an accuracy of 0.001 atomic mass units (amu) compared with 1 amu in conventional MS. Tentative identification is thus directed to a very limited set of compounds (or even one unique compound) based on the exact molecular formula rather than a fragmentation pattern, since HR/MS can discriminate between compounds with the same nominal molecular mass. LC-MS/MS has clear advantages over GC/MS in ease and speed of sample preparation and the opportunities for its automation. LC-HR/MS is more suitable to clinical toxicology because the drugs present in a sample are rarely known a priori, and tentative identifications of unknowns can be made without the availability of a reference standard or a library spectrum. Blood can be used in preference to urine which is more relevant to the patient's current clinical situation. Methods. A literature search was conducted using PUBMED for clinical toxicology, adulterants in illicit drugs and herbal supplements, and case reports using LC-TOF/MS and LC-HR/MS. Only 42 papers in English were identified in these searches. LC-HR/MS in clinical toxicology. LC-HR/MS has been used to detect designer drugs, doping agents, (neurosteroids) and adulterants such as levamisole, a veterinary antihelmitic found in street cocaine, and pharmaceuticals in herbal medications marketed to contain only natural ingredients. LC-HR/MS has proved useful for cases where existing tests were unable to identify the cause of the intoxication. One patient suffered a drug-induced seizure which was originally thought to be caused by an herbal medication, but diphenhydramine was determined to be the culprit. In another, 5-oxoproline was identified as the cause of metabolic acidosis seen in chronic acetaminophen (paracetamol) use. LC-HR/MS has successfully identified medications that were mislabeled or misrepresented street drugs. In one case, medications sold as diazepam were determined to be glyburide instead. The identification of novel designer amines, stimulants found in “bath salts”, and synthetic cannabinoids are well suited to LC-HR/MS. Dozens or even hundreds of possible compounds cannot realistically be tested on an individual basis by targeted LC-MS/MS or GC/MS analysis. Conclusions. LC-HR/MS offers unique opportunities for time-sensitive clinical analysis of blood samples from intoxicated patients and for comprehensive screening in a wide range of situations and materials. While the identification is not as definitive as that obtained by conventional fragmentation MS, the presumptive identification can be confirmed later with standards and spectral library matches. Optimum utilization of the presumptive diagnosis requires close collaboration between the laboratory analysts and their clinical counterparts.
- Research Article
- 10.1021/acs.est.5c03068
- May 9, 2025
- Environmental science & technology
A comprehensive assessment of pesticide transport in surface waters is challenging due to discharge characteristics and the occurrence of transformation products (TPs). Detailed long-term sampling of pesticide concentrations, including rainfall and pesticide application events, is still lacking to better predict pesticide transport pathways and toxicity within agricultural catchments. In the present study, pesticide and TP transport dynamics were evaluated over a three-year monitoring period, which included 12 stormwater events and 7 dry events. An extensive target screening for 328 pesticides was conducted, while simultaneously performing suspect and nontarget analysis (SNTA) using liquid chromatography high-resolution mass spectrometry. Twenty-one pesticides and two TPs associated with the main crop, rice, were identified as the major pollutants. The risk assessment results, based on the stepwise toxicity data collection, suggested that insecticides, primarily neonicotinoids, exhibited severe ecological risk. Additionally, SNTA revealed the presence of 8 parent compounds and 46 TPs. TPs occurred following parent peak periods, indicating that integrated pesticide monitoring is a practical approach to risk assessment. A precautionary approach using SNTA of parent pesticides and TP identification suggests that the potential aquatic effects of pesticide TPs may be underestimated by a conventional pesticide monitoring strategy.
- Research Article
- 10.1021/acs.estlett.5c00774
- Sep 9, 2025
- Environmental Science & Technology Letters
- Research Article
- 10.1021/acs.estlett.5c00587
- Sep 8, 2025
- Environmental science & technology letters
- Research Article
- 10.1021/acs.estlett.5c00759
- Aug 18, 2025
- Environmental Science & Technology Letters
- Research Article
- 10.1021/acs.estlett.5c00516
- Aug 13, 2025
- Environmental Science & Technology Letters
- Research Article
- 10.1021/acs.estlett.5c00688
- Aug 12, 2025
- Environmental Science & Technology Letters
- Research Article
- 10.1021/acs.estlett.5c00448
- Jul 29, 2025
- Environmental science & technology letters
- Research Article
- 10.1021/acs.estlett.5c00509
- Jul 17, 2025
- Environmental science & technology letters
- Research Article
- 10.1021/acs.estlett.5c00590
- Jul 17, 2025
- Environmental science & technology letters
- Research Article
- 10.1021/acs.estlett.5c00505
- Jul 2, 2025
- Environmental science & technology letters
- Research Article
- 10.1021/acs.estlett.5c00501
- Jun 24, 2025
- Environmental Science & Technology Letters
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.