Abstract We have developed an algorithm and implemented it in a software platform for the purpose of developing new anti-tumor drugs in the form of small molecules. In this study, we focused on generating molecules specifically for the treatment of lung cancer patients. To begin with, we employed deep learning (DL) techniques to evaluate the genes associated with poor clinical outcomes in lung cancer patients. By utilizing generative adversarial neural networks (GAN), we acquired additional patient data. The results of each experiment were presented as a list of genes ordered by their impact on the desired effect. We then intersected the lists of genes obtained from experiments with overall survival (OS) and progression-free interval (PFI) data. This allowed us to identify a set of genes whose expression was correlated with poor prognosis. In order to enhance the precision, we trained another DL model to distinguish between normal and tumor tissue based on gene expression. By doing so, we were able to identify the smaller set of genes that could be targeted. Subsequently, we developed a module that predicts the interactions between inhibitors and proteins. This involved representing protein amino acid sequences and chemical compound formulas in vector form, and a virtual screening of the Pubchem database. The implementation of the Drug-protein interactions module resulted in a dataset of 118,379 pairs, including 19,250 pairs describing compounds bound to proteins, and 99,129 precedents describing non-bound ones. DLwas applied, yielding a ROC-AUC of 0.86. Following the search for candidate molecules, we obtained 160,000 pairs with a predicted interaction probability above 0.99, as well as 2,921 pairs with probability of 1.0. Additionally, we created a DL-based module to predict the IC50 values in cell line experiments. Virtual pre-clinical trials were conducted using the selected inhibitors to identify relevant cell lines for subsequent laboratory experiments. Through this process, we obtained formulas for several molecules that demonstrated predicted binding to specific proteins. During the cell experiment emulation, our feature importance algorithm selected 129 genes. For the cell experiment emulation stage, we specifically chose interactions with a probability of at least 0.9. We prioritized molecules that acted on the minimum number of cell lines with a higher probability, thus ensuring higher specificity. Ultimately, we selected 5 small molecules as potential candidates, as well as certain cell lines for their validation. The NLP technologies utilized in this study demonstrated their effectiveness in processing tens of thousands of articles. The pipeline of methods presented in this paper lays the groundwork for automated AI-driven drug discovery. We have showcased the application of modern machine learning methods, particularly DL, as well as the methods used to prepare the initial data for the learning algorithms. The performance of these methods has been validated through cross-validation using data from publicly available sources. Citation Format: Dmitrii K Chebanov, Vsevolod A Misyurin, Nadezhda S Tatevosova. Deep learning-driven drug discovery: A breakthrough algorithm and its implication in lung cancer therapy development [abstract]. In: Proceedings of the AACR-NCI-EORTC Virtual International Conference on Molecular Targets and Cancer Therapeutics; 2023 Oct 11-15; Boston, MA. Philadelphia (PA): AACR; Mol Cancer Ther 2023;22(12 Suppl):Abstract nr A014.
Read full abstract