Abstract
Deep learning’s automatic feature extraction has proven its superior performance over traditional fingerprint-based features in the implementation of virtual screening models. However, these models face multiple challenges in the field of early drug discovery, such as over-training and generalization to unseen data, due to the inherently unbalanced and small datasets. In this work, the TranScreen pipeline is proposed, which utilizes transfer learning and a collection of weight initializations to overcome these challenges. An amount of 182 graph convolutional neural networks are trained on molecular source datasets and the learned knowledge is transferred to the target task for fine-tuning. The target task of p53-based bioactivity prediction, an important factor for anti-cancer discovery, is chosen to showcase the capability of the pipeline. Having trained a collection of source models, three different approaches are implemented to compare and rank them for a given task before fine-tuning. The results show improvement in performance of the model in multiple cases, with the best model increasing the area under receiver operating curve ROC-AUC from 0.75 to 0.91 and the recall from 0.25 to 1. This improvement is vital for practical virtual screening via lowering the false negatives and demonstrates the potential of transfer learning. The code and pre-trained models are made accessible online.
Highlights
Drug development is a long and costly process during which a drug candidate is discovered and widely tested to be both efficient and safe
Graph convolutional neural networks have improved the accuracy of virtual screening models, yet face the challenge of imbalanced, non-diverse, and small training datasets
Transfer learning is utilized from 182 source models trained on the MoleculeNet database
Summary
Drug development is a long and costly process during which a drug candidate is discovered and widely tested to be both efficient and safe. Molecular descriptors and fingerprints are used to extract features from the input molecules, which are passed to a machine learning model for training. This pipeline has been used for many virtual screening tasks such as kinase inhibition prediction [3], side-effect prediction [4], cytotoxicity prediction [5], and anti-cancer agent prediction [6].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.