Abstract
Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms). While there are many public datasets with reactions in synthetic direction (usually non-generic reactions), there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools.
Highlights
SummaryRetrosynthetic analysis is one of the main tasks in the planning of organic synthesis and a milestone in the computer-aided synthesis design
The SMIRKS linear notation [31] is used for describing the transforms in our collection
The rich SMIRKS syntax maintains sufficient functionality for a detailed description of the reaction centers, which is critical to the correct representation of a chemical transformation
Summary
Retrosynthetic analysis is one of the main tasks in the planning of organic synthesis and a milestone in the computer-aided synthesis design. While many retrosynthesis software systems are based on manually coded rules [5,6,7,8], some systems [4,9] attempt to automate the rule (transforms) generation process [10] in order to cover more reactions. Applying such an approach is certainly attractive, but the depth of the predictive models that use it strongly depend on the reaction databases they are working with [11]. ESalicnhetnraontastfioornms cwoarrsemspaonnudailnlyg ctoreaatwedidaenrdanpgroegorfawmemlla-ktincaolwlynteasntdedfrweqituhenthtley-Aumsebditresotrfotw-reaarectpiolantsf.oErmach[2t8r–a3n0s]f.oTrhme wenatsirme adnautaaslelyt wcraesatedadadnidtiponroagllryammcuartaitceadlly tceosntesdidweriitnhgtheaAllmbtriat nsosffotwrmaraetipolnastfor(mge[n2e8r–i3c0].reTahcetieonntsir)e daantdasetthweairs iandtderitcioonnnaellcyticounrsatiend acohniseirdaerrcihnigcaalllftarsahnisofnor. mTahteiopnrse(sgeennteerdicSrMeaIcRtiKonSs)naontadtitohnesirciannterbceonunseecdtiobnys ainnya chhieermarocihnifcoarlmfaasthicisonsy. sTtehme ptrheastesnutepdpoSrMtsISRMKSIRnKoStaltiinoenasrcnaontabteiouns.ed by any chemoinformatics system that supports SMIRKS linear notation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.