Abstract There is a problem of clinical trial failure, as each new drug should surpass the effectiveness of existing treatment regimens, which becomes increasingly challenging over time. Another significant issue is treating patients who have developed resistance to the current therapies. Essentially, the use of drug combinations or off-label drug use, where the indication does not match the diagnosis, is akin to an experiment, as there is insufficient data on which drug or combination to use. This work proposes an approach utilizing computer modeling of patients using gene expression and clinical data. Deep learning and generative adversarial networks are employed as modeling tools. The training data for the algorithms were sourced from publicly available databases such as TCGA, Drugbank, CCLE, and GDSC. The modeling is based on the hypothesis of similarity between patients, similarity between drugs, as well as the similarity between individual organs and patient tissues with cell lines, with similarity being computed mathematically. As a result, a patient model is created, where the input consists of drugs and their combinations, and the output provides survival probability values. These model data can be generated in any required quantity with generative adversarial networks (GAN) technology to create observation and control groups. Consequently, it becomes possible to simulate clinical trials, forecasting their outcomes, and, most importantly, optimizing the trial parameters to maximize the likelihood of success. We obtained patient data and drug information from the TCGA (The Cancer Genome Atlas) database. Biologicals were excluded, and only small molecules, including targeted therapies and chemotherapeutic agents, were retained. The dataset consisted of information on 3225 patients and the effects of 161 drugs. According to our hypothesis, the patient's profile includes three groups of factors, and each of these groups contributes approximately equally to the duration of overall survival (OS) and disease-free survival (DFS) intervals. The first group comprises clinical data such as age, diagnosis, and disease stage. The second group consists of genetic data, specifically gene expression profiles, which capture the functional characteristics of genes. The third group represents the drug or combination of drugs used for patient treatment. Therefore, it is necessary to ensure that all three groups of factors have comparable dimensions of influence. For drug data, we obtained information from the DrugBank database. We represented the drug molecules using SMILES (Simplified Molecular Input Line Entry System) notation, which were then converted into 100-dimensional vectors using embedding techniques derived from natural language processing technologies. To represent combinations of drugs, we employed vector addition of the individual drug vectors. As the outcome measure for prediction, we considered numerical values representing the durations of OS and DFS intervals for each respective task. Citation Format: Dmitrii K. Chebanov, Vsevolod A. Misyurin. Modeling of new drugs clinical trials outcome with patients’ digital twins cohorts [abstract]. In: Proceedings of the AACR Special Conference on Ovarian Cancer; 2023 Oct 5-7; Boston, Massachusetts. Philadelphia (PA): AACR; Cancer Res 2024;84(5 Suppl_2):Abstract nr B023.
Read full abstract