Abstract

Cocrystal engineering have been widely applied in pharmaceutical, chemistry and material fields. However, how to effectively choose coformer has been a challenging task on experiments. Here we develop a graph neural network (GNN) based deep learning framework to quickly predict formation of the cocrystal. In order to capture main driving force to crystallization from 6819 positive and 1052 negative samples reported by experiments, a feasible GNN framework is explored to integrate important prior knowledge into end-to-end learning on the molecular graph. The model is strongly validated against seven competitive models and three challenging independent test sets involving pharmaceutical cocrystals, π–π cocrystals and energetic cocrystals, exhibiting superior performance with accuracy higher than 96%, confirming its robustness and generalization. Furthermore, one new energetic cocrystal predicted is successfully synthesized, showcasing high potential of the model in practice. All the data and source codes are available at https://github.com/Saoge123/ccgnet for aiding cocrystal community.

Highlights

  • Cocrystal engineering have been widely applied in pharmaceutical, chemistry and material fields

  • We explore a flexible graph neural network (GNN)-based deep learning (DL) framework that effectively integrates the empirical knowledge into end-to-end learning on the molecular graph, which can be feasibly applied to the CCs that are significantly different from the training dataset through transfer learning

  • We develop a GNN-based DL model coupled with the feature complementary strategy to accurately predict the formation of the cocrystal

Read more

Summary

Results

We select pyrene as a case to validate the generalization performance of CCGNet to the π–π CCs. the independent test set involving pyrene contains 58 positive samples and 6 negative ones collected from experiment reports (see Supplementary Table 9 for details). When we directly apply the CCGNet model and the seven competitive ones trained on the cocrystal dataset (i.e., CC dataset) containing 7871 samples to the independent test set of TNT and CL -20, the balanced accuracies are very low, lower than 61% for TNT and 59% for CL-20 (see Fig. 3d, e and Supplementary Table 10), different from the high performance on the pharmaceutical and π–π CCs. The reason should be attributed to the fact that the energetic molecules have significantly different structures from common organic CCs from CSD, for example, rich nitro groups or caged structures like CL20. To cape with the problem, it is a NAFYUR Score: 47.57

Random initialization
Discussion
Methods
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.