Abstract

The derivation of structural characteristics of a compound of unknown structure from its spectral data is a central procedure for modern structure elucidation. Computer searching in spectral libraries of fully-assigned 13C NMR spectra of substructures or full structures is an essential part of structure elucidation and has been widely applied because this type of spectra reflects the nature of the skeletal backbone of an organic compound, information not as readily available by other spectroscopic techniques [1]. In this paper we describe an extensive test of a previously developed method for interpretive search in spectral libraries of fully-assigned 13C NMR spectra [2]. The method is implemented into a Windows-based userfriendly program, called Infer C NMR. The program input consists of the 13C NMR spectrum of the unknown compound (chemical shift and multiplicity of each signal) and molecular formula. The search algorithm retrieves a list of connected substructures from the reference compounds in such a way that only atoms with matched signals are included into the inferred substructures. The substructures are explicitly defined in terms of atom type, hydrogen multiplicity, and bond type. They are presented embedded into the reference structures and are sorted according to their reliability (estimation of their correctness, usually called accuracy). This accuracy is calculated by a multivariate function that was obtained in advance by comprehensive statistics. The parameters that restrict the search algorithm are the tolerance of signal matching (Tol) in ppm and the minimum number of carbon atoms in the inferred (retrieved) substructures (m.n.c.); the latter is set to six for this study. Although our program for interpretive library search is intended to serve as a useful stand-alone application for the spectroscopist, its output can be sent as input to a computer-enhanced structure elucidation system, such as SESAMI [3]. In this mode, one or more of the retrieved substructures act as constraints on the structure generation process, serving to reduce the number of plausible alternative structures that are presented to the chemist by the structure generator. The greater the number and information content of the constraints, the greater the efficiency of the structure generator and the fewer the structures produced. It is important to recognize that if a substructure predicted as present is handed to the structure generator, every structure output will contain that substructure. Thus, if even only one of the retrieved substructures used as constraints is incorrect, every structure produced by structure generator will be invalid (the worst scenario) or no output structures will be generated because the constraints contradict each other or other spectral data (a better scenario). That is why the output from the interpretive 13C NMR library search must have two very important features: high information content and high reliability. As described in a previous paper [2], the accuracy function was tested with a large validation set of nearly 12,740 spectra by leave-one-out cross-validation. These spectra were part of the library and some of them are not natural compounds but smaller ones produced by chemical synthesis. That is why the present test gives a better estimation of real-world capabilities of the interpretive library search. One hundred and four 13C NMR spectra of compounds isolated from plants and published in Phytochemistry (year 2002, volumes 58–59) were searched in a library of 38 225 fully-assigned 13C NMR spectra. Four spectra retrieved no substructures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call