Abstract
The most recent version of the Cahn-Ingold-Prelog rules for the determination of stereodescriptors as described in Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 (the "Blue Book"; Favre and Powell. Royal Society of Chemistry, 2014; http://dx.doi.org/10.1039/9781849733069 ) were analyzed by an international team of cheminformatics software developers. Algorithms for machine implementation were designed, tested, and cross-validated. Deficiencies in Sequence Rules 1b and 2 were found, and proposed language for their modification is presented. A concise definition of an additional rule ("Rule 6", below) is proposed, which succinctly covers several cases only tangentially mentioned in the 2013 recommendations. Each rule is discussed from the perspective of machine implementation. The four resultant implementations are supported by a 300-compound validation suite in both 2D and 3D structure data file (SDF) format as well as SMILES ( https://cipvalidationsuite.github.io/ValidationSuite ). The validation suites include all significant examples in Chapter 9 of the Blue Book, as well as several additional structures that highlight more complex aspects of the rules not addressed or not clearly analyzed in that work. These additional structures support a case for the need for modifications to the Sequence Rules.
Highlights
In the 60+ years since the introduction of Cahn-Ingold-Prelog Sequence Rules in 1956,1 the “CIP Rules” have become an integral part of chemical nomenclature, providing a way to identify the spatial arrangement of atoms of a molecule using simple mostly atom- or bond-based stereodescriptors
The four resultant implementations are supported by validation suites in 2D and 3D SDF format as well as SMILES
In order to provide a single resource summarizing the state of the evolving rules, the International Union of Pure and Applied Chemists (IUPAC) published the first comprehensive description of the CIP Rules in Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013.4
Summary
In the 60+ years since the introduction of Cahn-Ingold-Prelog Sequence Rules in 1956,1 the “CIP Rules” have become an integral part of chemical nomenclature, providing a way to identify the spatial arrangement of atoms of a molecule using simple mostly atom- or bond-based stereodescriptors. The issue is just one specific case of more general problem of absent procedures to assign root distances for duplicates resulted from averaging of atomic numbers. The implementation problem in this case relates to comparisons where one atom has an isotope indicated and one does not, and (again) when several alternative Kekulé structures are involved.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have