Abstract
In optimizing organic chemical synthesis, researchers often face challenges in efficiently generating viable synthesis procedures that conserve time and resources in laboratory settings. This paper systematically analyzes multiple approaches to efficiently generate synthesis procedures for a wide variety of organic synthesis reactions, aiming to decrease time and resource consumption in laboratory work. We investigated the suitability of different sizes of BART, T5, FLAN-T5, molT5, and classic sequence-to-sequence transformer models for our text-to-text task and utilized a large dataset prepared specifically for the task. Experimental investigations demonstrated that a fine-tuned molT5-large model achieves a BLEU score of 47.75. The results demonstrate the capability of LLMs to predict chemical synthesis procedures involving 24 possible distinct actions, many of which include various parameters like solvents, reaction agents, temperature, duration, solvent ratios, and other specific parameters. Our findings show that only when the core reactants are used as input, the models learn to correctly predict what ancillary components need to be included in the resulting procedure. These results are valuable for AI researchers and chemists, suggesting that curated datasets and large language model fine-tuning techniques can be tailored for specific reaction classes and practical applications. This research contributes to the field by demonstrating how deep-learning-based methods can be customized to meet the specific requirements of chemical synthesis, leading to more intelligent and resource-efficient laboratory processes.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have