SAND: Automated Time-Domain Modeling of NMR Spectra Applied to Metabolite Quantification.

Yue Wu,Yue Wu,Arthur S Edison,Jeffrey C Hoch,Krish Krishnamurthy,Arthur S Edison,Omid Sanati,Mario Uchimiya,Omid Sanati,Frank Delaglio,Arthur S Edison,Jonathan Wedell

doi:10.1021/acs.analchem.3c03078

Yue Wu, Yue Wu + Show 10 more

Open Access

https://doi.org/10.1021/acs.analchem.3c03078

Copy DOI

Journal: Analytical Chemistry	Publication Date: Jan 26, 2024
Citations: 2	License type: CC BY-NC-ND 4.0

Affiliation: University of Georgia

Abstract

Developments in untargeted nuclear magnetic resonance (NMR) metabolomics enable the profiling of thousands of biological samples. The exploitation of this rich source of information requires a detailed quantification of spectral features. However, the development of a consistent and automatic workflow has been challenging because of extensive signal overlap. To address this challenge, we introduce the software Spectral Automated NMR Decomposition (SAND). SAND follows on from the previous success of time-domain modeling and automatically quantifies entire spectra without manual interaction. The SAND approach uses hybrid optimization with Markov chain Monte Carlo methods, employing subsampling in both time and frequency domains. In particular, SAND randomly divides the time-domain data into training and validation sets to help avoid overfitting. We demonstrate the accuracy of SAND, which provides a correlation of ∼0.9 with ground truth on cases including highly overlapped simulated data sets, a two-compound mixture, and a urine sample spiked with different amounts of a four-compound mixture. We further demonstrate an automated annotation using correlation networks derived from SAND decomposed peaks, and on average, 74% of peaks for each compound can be recovered in single clusters. SAND is available in NMRbox, the cloud computing environment for NMR software hosted by the Network for Advanced NMR (NAN). Since the SAND method uses time-domain subsampling (i.e., random subset of time-domain points), it has the potential to be extended to a higher dimensionality and nonuniformly sampled data.

Full Text