Abstract

Text processing techniques in Natural Language Processing (NLP) find applications in many industries such as pharmaceutical, automation, and automotive. Drug design using variational autoencoders is a popular data-assisted technique to design drug molecules with control over molecular properties. It generates continuous latent space, which can be optimized. This paper introduces a constrained variational autoencoder-based molecular generation structure using the SMILES format. The proposal is accompanied by the generation of molecules, filtering them based on scores, and subsequently determining the optimal molecules by using NLP matured techniques. To generate more meaningful latent space, a condition vector of molecular properties is combined with the SMILES representation of molecules. A tunable parameter (diversity,D) is also used to control the diversity in the generated molecules. The proposed architecture is evaluated using standard datasets. Validity, uniqueness, and FCD are evaluation matrices used to access the performance of model. The validity of proposed model is maximum (92.11%) at diversity level 1. As diversity level increases the validity of generated molecules decreases. This is intuitively consistent because increased diversity reduces replicas and improves variety in the generated molecules. Thus proposed model provide control over diversity of generated molecules. The results clearly indicate that the proposed method outperforms other SMILE based methods and gives a new direction for the generation of desired molecules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call