Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach

Vipul Mann,Venkat Venkatasubramanian

doi:10.1016/j.compchemeng.2021.107533

Vipul Mann, Venkat Venkatasubramanian

Open Access

https://doi.org/10.1016/j.compchemeng.2021.107533

Copy DOI

Journal: Computers & chemical engineering	Publication Date: Sep 11, 2021
Citations: 19	License type: cc-by-nc-nd

Affiliation: Columbia University

Abstract

Retrosynthetic prediction is one of the main challenges in chemical synthesis because it requires a search over the space of plausible chemical reactions that often results in complex, multi-step, branched synthesis trees for even moderately complex organic reactions. Here, we propose an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework. Information-theoretic analyses of such grammar-representations reveal that they are superior to SMILES representations and are better-suited for machine learning tasks due to their underlying redundancy and high information capacity. We report the top-1 prediction accuracy of 43.8% (syntactic validity 95.6%) and maximal fragment (MaxFrag) accuracy of 50.4%. Comparing our model’s performance with previous work that used character-based SMILES representations demonstrate significant reduction in grammatically invalid predictions and improved prediction accuracy. Fewer invalid predictions for both known and unknown reaction class scenarios demonstrate the model’s ability to learn the underlying SMILES grammar efficiently.

Full Text