Abstract
The generation of constitutional isomer chemical spaces has been a subject of cheminformatics since the early 1960s, with applications in structure elucidation and elsewhere. In order to perform such a generation efficiently, exhaustively and isomorphism-free, the structure generator needs to ensure the building of canonical graphs already during the generation step and not by subsequent filtering. Here we present MAYGEN, an open-source, pure-Java development of a constitutional isomer molecular generator. The principles of MAYGEN’s architecture and algorithm are outlined and the software is benchmarked in single-threaded mode against the state-of-the-art, but closed-source solution MOLGEN, as well as against the best open-source solution PMG. Based on the benchmarking, MAYGEN is on average 47 times faster than PMG and on average three times slower than MOLGEN in performance.
Highlights
Unconstrained isomer generation has received attention over the past decades as a means to assess the theoretically existing chemical space and as a hypothesis generator
The works of Jean-Louis Reymond and coworkers for the creation of the GDB-11 [1], GDB-13 [2] and GDB-17 [3] databases, enumerating all possible molecules with 11, 13, and 17 non-hydrogen atoms, respectively, in the molecular formula, have laid out the motivations for unconstrained isomer generation and the exploitation of its results in sufficient detail. Such molecular generation methods can be used as hypothesis generators in areas such as computer-assisted structure elucidation, and to answer broader questions such as the exact size of a chemical space
We present the development of an opensource structure generator MAYGEN, a pure-Java constitutional isomer generator based on the principle of orderly generation described by Grund et al [18]
Summary
Unconstrained isomer generation has received attention over the past decades as a means to assess the theoretically existing chemical space and as a hypothesis generator. The works of Jean-Louis Reymond and coworkers for the creation of the GDB-11 [1], GDB-13 [2] and GDB-17 [3] databases, enumerating all possible molecules with 11, 13, and 17 non-hydrogen atoms, respectively, in the molecular formula, have laid out the motivations for unconstrained isomer generation and the exploitation of its results in sufficient detail. Such molecular generation methods can be used as hypothesis generators in areas such as computer-assisted structure elucidation, and to answer broader questions such as the exact size of a chemical space. See “Results” section of the present manuscript
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.