Comparative analysis of regression algorithms for drug response prediction using GDSC dataset
BackgroundDrug response prediction can infer the relationship between an individual’s genetic profile and a drug, which can be used to determine the choice of treatment for an individual patient. Prediction of drug response is recently being performed using machine learning technology. However, high-throughput sequencing data produces thousands of features per patient. In addition, it is difficult for researchers to know which algorithm is appropriate for prediction as various regression and feature selection algorithms exist.MethodsWe compared and evaluated the performance of 13 representative regression algorithms using Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Three analyses was conducted to show the effect of feature selection methods, multiomics information, and drug categories on drug response prediction.ResultsIn the experiments, Support Vector Regression algorithm and gene features selected with LINC L1000 dataset showed the best performance in terms of accuracy and execution time. However, integration of mutation and copy number variation information did not contribute to the prediction. Among the drug groups, responses of drugs related with hormone-related pathway were predicted with relatively high accuracy.ConclusionThis study can help bioinformatics researchers design data processing steps and select algorithms for drug response prediction, and develop a new drug response prediction model based on the GDSC or other high-throughput sequencing datasets.
- # Genomics Of Drug Sensitivity In Cancer Dataset
- # Genomics Of Drug Sensitivity In Cancer
- # Regression Algorithms
- # Effect Of Feature Selection Methods
- # Copy Number Variation Information
- # Genomics Of Drug Sensitivity
- # Support Vector Regression Algorithm
- # Drug Sensitivity In Cancer
- # Drug Response Prediction
- # High-throughput Sequencing Datasets
- Research Article
6
- 10.3390/diagnostics13122043
- Jun 13, 2023
- Diagnostics
With the beginning of the high-throughput screening, in silico-based drug response analysis has opened lots of research avenues in the field of personalized medicine. For a decade, many different predicting techniques have been recommended for the antineoplastic (anti-cancer) drug response, but still, there is a need for improvements in drug sensitivity prediction. The intent of this research study is to propose a framework, namely NeuPD, to validate the potential anti-cancer drugs against a panel of cancer cell lines in publicly available datasets. The datasets used in this work are Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE). As not all drugs are effective on cancer cell lines, we have worked on 10 essential drugs from the GDSC dataset that have achieved the best modeling results in previous studies. We also extracted 1610 essential oncogene expressions from 983 cell lines from the same dataset. Whereas, from the CCLE dataset, 16,383 gene expressions from 1037 cell lines and 24 drugs have been used in our experiments. For dimensionality reduction, Pearson correlation is applied to best fit the model. We integrate the genomic features of cell lines and drugs’ fingerprints to fit the neural network model. For evaluation of the proposed NeuPD framework, we have used repeated K-fold cross-validation with 5 times repeats where K = 10 to demonstrate the performance in terms of root mean square error (RMSE) and coefficient determination (R2). The results obtained on the GDSC dataset that were measured using these cost functions show that our proposed NeuPD framework has outperformed existing approaches with an RMSE of 0.490 and R2 of 0.929.
- Research Article
95
- 10.1016/j.omtn.2019.05.017
- Jun 4, 2019
- Molecular Therapy - Nucleic Acids
Precision medicine has become a novel and rising concept, which depends much on the identification of individual genomic signatures for different patients. The cancer cell lines could reflect the “omic” diversity of primary tumors, based on which many works have been carried out to study the cancer biology and drug discovery both in experimental and computational aspects. In this work, we presented a novel method to utilize weighted graph regularized matrix factorization (WGRMF) for inferring anticancer drug response in cell lines. We constructed a p-nearest neighbor graph to sparsify drug similarity matrix and cell line similarity matrix, respectively. Using the sparsified matrices in the graph regularization terms, we performed matrix factorization to generate the latent matrices for drug and cell line. The graph regularization terms including neighbor information could help to exclude the noisy ingredient and improve the prediction accuracy. The 10-fold cross-validation was implemented, and the Pearson correlation coefficient (PCC), root-mean-square error (RMSE), PCCsr, and RMSEsr averaged over all drugs were calculated to evaluate the performance of WGRMF. The results on the Genomics of Drug Sensitivity in Cancer (GDSC) dataset are 0.64 ± 0.16, 1.37 ± 0.35, 0.73 ± 0.14, and 1.71 ± 0.44 for PCC, RMSE, PCCsr, and RMSEsr in turn. And for the Cancer Cell Line Encyclopedia (CCLE) dataset, WGRMF got results of 0.72 ± 0.09, 0.56 ± 0.19, 0.79 ± 0.07, and 0.69 ± 0.19, respectively. The results showed the superiority of WGRMF compared with previous methods. Besides, based on the prediction results using the GDSC dataset, three types of case studies were carried out. The results from both cross-validation and case studies have shown the effectiveness of WGRMF on the prediction of drug response in cell lines.
- Research Article
178
- 10.1186/s12885-017-3500-5
- Aug 2, 2017
- BMC Cancer
BackgroundHuman cancer cell lines are used in research to study the biology of cancer and to test cancer treatments. Recently there are already some large panels of several hundred human cancer cell lines which are characterized with genomic and pharmacological data. The ability to predict drug responses using these pharmacogenomics data can facilitate the development of precision cancer medicines. Although several methods have been developed to address the drug response prediction, there are many challenges in obtaining accurate prediction.MethodsBased on the fact that similar cell lines and similar drugs exhibit similar drug responses, we adopted a similarity-regularized matrix factorization (SRMF) method to predict anticancer drug responses of cell lines using chemical structures of drugs and baseline gene expression levels in cell lines. Specifically, chemical structural similarity of drugs and gene expression profile similarity of cell lines were considered as regularization terms, which were incorporated to the drug response matrix factorization model.ResultsWe first demonstrated the effectiveness of SRMF using a set of simulation data and compared it with two typical similarity-based methods. Furthermore, we applied it to the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) datasets, and performance of SRMF exceeds three state-of-the-art methods. We also applied SRMF to estimate the missing drug response values in the GDSC dataset. Even though SRMF does not specifically model mutation information, it could correctly predict drug-cancer gene associations that are consistent with existing data, and identify novel drug-cancer gene associations that are not found in existing data as well. SRMF can also aid in drug repositioning. The newly predicted drug responses of GDSC dataset suggest that mTOR inhibitor rapamycin was sensitive to non-small cell lung cancer (NSCLC), and expression of AK1RC3 and HINT1 may be adjunct markers of cell line sensitivity to rapamycin.ConclusionsOur analysis showed that the proposed data integration method is able to improve the accuracy of prediction of anticancer drug responses in cell lines, and can identify consistent and novel drug-cancer gene associations compared to existing data as well as aid in drug repositioning.
- Abstract
- 10.1093/neuonc/noad073.212
- Jun 12, 2023
- Neuro-Oncology
INTRODUCTIONPediatric low-grade gliomas (pLGG), the most common brain tumors in children, are driven by alterations in the MAPK pathway. Several clinical trials have shown the potential for MAPK inhibitors (MAPKi) treatment in pLGG. However, the range of response is broad, even within entities sharing the same driving genetic MAPK alteration. A predictive stratification tool is needed to identify patients that will be more likely to benefit from MAPKi therapy. METHODSWe generated gene-expression-based MAPKi sensitivity scores (MSS) for each MAPKi class (BRAFi, MEKi, ERKi), based on MAPK-related genes differentially regulated between MAPKi sensitive and non-sensitive cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Single sample Gene Set Enrichment Analysis (ssGSEA) was used to measure and validate our MSSs in the GDSC dataset and an independent PDX dataset (XevaDB). The validated signatures were tested in a pLGG-specific background, using gene expression data from PA cell lines and primary pLGG samples. RESULTSOur MSS could differentiate MAPKi sensitive cells in the GDSC dataset, and significantly correlated with MAPKi response in the XevaDB PDX dataset. The MSS were able to differentiate glioma entities with differing MAPK alterations from non-MAPK altered entities, and showed the highest scores in pLGG. The MSSs were heterogeneous within pLGG entities with a common MAPK alteration, as observed in MAPKi clinical studies. Intriguingly, a strong correlation between our MSS and the predicted immune cell infiltration rate, as determined by the Estimate score, was observed and confirmed in a pLGG scRNA sequencing dataset. CONCLUSIONThese data demonstrate the relevance of gene-expression signatures to predict response to MAPKi treatment in pLGG, and will be further investigated in a prospective manner in upcoming clinical trials. In addition, our data could suggest a role of immune infiltration in the response to MAPKi in pLGG that warrants further validation.
- Abstract
- 10.1093/neuonc/noac079.340
- Jun 3, 2022
- Neuro-Oncology
Pediatric low-grade glioma (pLGG), the most common brain tumors in children, are driven by alterations in the MAPK pathway. Several clinical trials have shown the potential for MAPK inhibitor (MAPKi) treatment in pLGG. However, the range of response to MAPKi is heterogeneous, even between tumors sharing the same driving MAPK alteration. A predictive stratification tool is needed to identify tumors that will be sensitive to MAPK inhibition. We generated sensitivity gene signatures for each MAPKi class (BRAFi, MEKi, ERKi), based on MAPK-related genes differentially regulated between MAPKi sensitive and non-sensitive cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. Single sample Gene Set Enrichment Analysis was used to measure and validate the MAPKi predictive sensitivity scores in the GDSC dataset and an independent patient-derived xenograft (PDX) dataset (XevaDB). The validated signatures were tested in a pLGG-specific background, using gene expression data from pLGG cell lines and primary pLGG samples. Our MAPKi sensitivity signatures discriminated MAPKi sensitive and non-sensitive cells in the GDSC dataset, and significantly correlated with MAPKi response in the PDX dataset. The sensitivity scores discerned gliomas with varying MAPK alterations from those without MAPK alterations, and showed higher scores in pLGG compared to high-grade gliomas and normal brain tissue. MAPKi-predicted sensitivity was heterogeneous within pLGG groups with a common MAPK alteration, as observed in MAPKi clinical trials. Intriguingly, we observed a strong positive correlation between our MAPKi sensitivity signature scores and the predicted immune cell infiltration rate as determined by the ESTIMATE score. These data demonstrate the potential relevance of gene-expression signatures to predict response to MAPKi treatment in pLGG patients, worth of further investigation in a prospective manner in upcoming clinical trials. In addition, our data could support a role of immune cell infiltration in the response to MAPKi in pLGG, warranting further validation.
- Research Article
10
- 10.1158/1538-7445.am2013-2206
- Apr 15, 2013
- Cancer Research
The Genomic of Drug Sensitivity in Cancer (GDSC; www.cancerRxgene.org) resource facilitates development of targeted cancer therapies through pre-clinical identification of therapeutic biomarkers. GDSC is the largest public resource for information on drug sensitivity in cancer cells and links these data to extensive genomic information to identify molecular features that influence anticancer drug response. There is compelling evidence that alterations in cancer genomes strongly influence clinical responses to anticancer therapies. There are several examples where genomic changes are used as molecular biomarkers to stratify patients most likely to benefit from a treatment (e.g. BRAF in melanoma). Despite these successes, the majority of cancer drugs have not been linked to specific molecular features that could be used to direct their clinical use to maximize patient benefit. We are using pharmacogenomic profiling in cancer cell lines as a biomarker discovery platform by systematically linking pharmacological data with genomic information in cancer cells. The GDSC database contains drug sensitivity data generated from high-throughput screening performed by the Cancer Genome Project at the Wellcome Trust Sanger Institute and the Center for Molecular Therapeutics at Massachusetts General Hospital using a collection of >1,200 cancer cell lines. GDSC release v3 (November 2012) contains drug sensitivity data for almost 80,000 experiments, describing response to 142 anticancer drugs across over 700 cancer cell lines. To identify molecular markers of drug response, cell line drug sensitivity data are integrated with large genomic datasets obtained from COSMIC (Catalogue of Somatic Mutations in Cancer), including information on somatic mutations in cancer genes, gene amplification and deletion, tissue type and transcriptional data. Analysis of GDSC data is through a web portal based on queries of specific anticancer drugs or cancer genes. Interactive graphical representations of the data are used throughout with links to related resources, and all datasets are freely available and downloadable. The GDSC database will undergo significant expansion in coming years as drug sensitivity and genomic datasets increase in size and complexity. GDSC provides a unique public resource incorporating large drug sensitivity and genomic datasets to facilitate discovery of new therapeutic biomarkers for cancer therapies. Citation Format: Wanjuan Yang, Jorge Soares, Patricia Greninger, Elena Edelman, Howard Lightfoot, Simon Forbes, Ramaswamy Sridhar, P. Andrew Futreal, Daniel Haber, Michael Stratton, Cyril Benes, Ultan McDermott, Mathew Garnett. Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 2206. doi:10.1158/1538-7445.AM2013-2206
- Research Article
27
- 10.1186/s12859-021-04146-z
- May 25, 2021
- BMC Bioinformatics
BackgroundPredicting the drug response of a patient is important for precision oncology. In recent studies, multi-omics data have been used to improve the prediction accuracy of drug response. Although multi-omics data are good resources for drug response prediction, the large dimension of data tends to hinder performance improvement. In this study, we aimed to develop a new method, which can effectively reduce the large dimension of data, based on the supervised deep learning model for predicting drug response.ResultsWe proposed a novel method called Supervised Feature Extraction Learning using Triplet loss (Super.FELT) for drug response prediction. Super.FELT consists of three stages, namely, feature selection, feature encoding using a supervised method, and binary classification of drug response (sensitive or resistant). We used multi-omics data including mutation, copy number aberration, and gene expression, and these were obtained from cell lines [Genomics of Drug Sensitivity in Cancer (GDSC), Cancer Cell Line Encyclopedia (CCLE), and Cancer Therapeutics Response Portal (CTRP)], patient-derived tumor xenografts (PDX), and The Cancer Genome Atlas (TCGA). GDSC was used for training and cross-validation tests, and CCLE, CTRP, PDX, and TCGA were used for external validation. We performed ablation studies for the three stages and verified that the use of multi-omics data guarantees better performance of drug response prediction. Our results verified that Super.FELT outperformed the other methods at external validation on PDX and TCGA and was good at cross-validation on GDSC and external validation on CCLE and CTRP. In addition, through our experiments, we confirmed that using multi-omics data is useful for external non-cell line data.ConclusionBy separating the three stages, Super.FELT achieved better performance than the other methods. Through our results, we found that it is important to train encoders and a classifier independently, especially for external test on PDX and TCGA. Moreover, although gene expression is the most powerful data on cell line data, multi-omics promises better performance for external validation on non-cell line data than gene expression data. Source codes of Super.FELT are available at https://github.com/DMCB-GIST/Super.FELT.
- Research Article
4
- 10.1007/s10994-023-06338-5
- Aug 7, 2023
- Machine Learning
We develop a new model for tensor completion which incorporates noisy side information available on the rows and columns of a 3-dimensional tensor. This method learns a low rank representation of the data along with regression coefficients for the observed noisy features. Given this model, we propose an efficient alternating minimization algorithm to find high-quality solutions that scales to large data sets. Through extensive computational experiments, we demonstrate that this method leads to significant gains in out-of-sample accuracy filling in missing values in both simulated and real-world data. We consider the problem of imputing drug response in three large-scale anti-cancer drug screening data sets: the Genomics of Drug Sensitivity in Cancer (GDSC), the Cancer Cell Line Encyclopedia (CCLE), and the Genentech Cell Line Screening Initiative (GCSI). On imputation tasks with 20% to 80% missing data, we show that the proposed method TensorGenomic matches or outperforms state-of-the-art methods including the original tensor model and a multilevel mixed effects model. With 80% missing data, TensorGenomic improves the R^2 from 0.404 to 0.552 in the GDSC data set, 0.407 to 0.524 in the CCLE data set, and 0.331 to 0.453 in the GCSI data set compared to the tensor model which does not take into account genomic side information.
- Research Article
13
- 10.2174/1574893617666220609114052
- Nov 1, 2022
- Current Bioinformatics
Background: Anti-cancer drug response is urgently required for individualized therapy. Measurements with wet experiments are costly and time-consuming. Artificial intelligence-based models are currently available for predicting drug response but still have challenges in prediction accuracy Objective: Construct a model to predict drug response values for unknown cell lines and analyze drug potential association properties in sparse data. Methods: Propose a Neural Matrix Factorization (NeuMF) framework to help predict the unknown responses of cell lines to drugs. The model uses a deep neural network to figure out drug and cell lines' latent variables. In NeuMF, the inputs and the parameters of the multi-layer neural network are simultaneously optimized by gradient descent to minimize the reconstruction errors between the predicted and natural values of the observed entries. Then the unknown entries can be readily recovered by propagating the latent variables to the output layer. Results: Experiments on the Cancer Cell Line Encyclopedia (CCLE) dataset and Genomics of Drug Sensitivity in Cancer (GDSC) dataset compare NeuMF with the other three state-of-the-art methods. NeuMF reduces constructing drug or cell line similarity and mines the response matrix itself for correlations in the network, avoiding the inclusion of redundant noise. NeuMF obtained drug averaged PCC_sr of 0.83 and 0.84 on both datasets. It demonstrates that NeuMF substantially improves the prediction. Some essential parameters in NeuMF, such as the strategy of global effect removal and the scales of the input layer, are also discussed. Finally, case studies have shown that NeuMF can better learn the latent characteristics of drugs, e.g., Irinotecan and Topotecan are found to act on the same pathway TOP1. The conclusions are in line with some existing biological findings. Results: Experiments on the Cancer Cell Line Encyclopedia (CCLE) dataset and Genomics of Drug Sensitivity in Cancer (GDSC) dataset compare NeuMF with the other three state-of-the-art methods. NeuMF reduces constructing drug or cell line similarity and mines the response matrix itself for correlations in the network, avoiding the inclusion of redundant noise. NeuMF obtained drug averaged PCC_sr of 0.83 and 0.84 on both datasets. It demonstrates that NeuMF substantially improves the prediction. Some essential parameters in NeuMF, such as the global effect removal strategy and the input layer scales, are also discussed. Finally, case studies have shown that NeuMF can better learn the latent characteristics of drugs, e.g., Irinotecan and Topotecan are found to act on the same pathway TOP1. The conclusions are in line with some existing biological findings. Conclusion: NeuMF achieves better prediction accuracy than existing models, and its output is biologically interpretable. NeuMF also helps analyze the correlations between drugs.
- Research Article
159
- 10.1109/tcbb.2019.2919581
- Mar 1, 2021
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
High-throughput screening technologies have provided a large amount of drug sensitivity data for a panel of cancer cell lines and hundreds of compounds. Computational approaches to analyzing these data can benefit anticancer therapeutics by identifying molecular genomic determinants of drug sensitivity and developing new anticancer drugs. In this study, we have developed a deep learning architecture to improve the performance of drug sensitivity prediction based on these data. We integrated both genomic features of cell lines and chemical information of compounds to predict the half maximal inhibitory concentrations [Formula: see text] on the Cancer Cell Line Encyclopedia (CCLE) and the Genomics of Drug Sensitivity in Cancer (GDSC) datasets using a deep neural network, which we called DeepDSC. Specifically, we first applied a stacked deep autoencoder to extract genomic features of cell lines from gene expression data, and then combined the compounds' chemical features to these genomic features to produce final response data. We conducted 10-fold cross-validation to demonstrate the performance of our deep model in terms of root-mean-square error (RMSE) and coefficient of determination [Formula: see text]. We show that our model outperforms the previous approaches with RMSE of 0.23 and [Formula: see text] of 0.78 on CCLE dataset, and RMSE of 0.52 and [Formula: see text] of 0.78 on GDSC dataset, respectively. Moreover, to demonstrate the prediction ability of our models on novel cell lines or novel compounds, we left cell lines originating from the same tissue and each compound out as the test sets, respectively, and the rest as training sets. The performance was comparable to other methods.
- Research Article
1
- 10.1002/ima.23060
- Mar 1, 2024
- International Journal of Imaging Systems and Technology
The research aims to improve the prediction of drug sensitivity on cancer cell lines using gene expression data and molecular fingerprints of drugs. The proposed study uses a deep learning model, BioMarkerX, trained on the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) datasets utilizing Particle Swarm Optimization technique to select specific genes as features. The model achieves high prediction accuracy with a Root Mean Square Error (RMSE) of 0.40 ± 0.02 and R2 of 0.83 ± 0.03 on the CCLE dataset, and an RMSE of 0.36 ± 0.05 and R2 of 0.83 ± 0.03 on the GDSC dataset. The approach also used an explainable artificial intelligence model to discover biological markers linked to cancer development. This can provide insights into targeted therapies for improving cancer treatment outcomes. Overall, the study presents an effective approach for identifying important biological markers relevant to cancer disease, aiding in the development of more efficient anticancer medications.
- Research Article
1
- 10.21037/tgh-24-15
- Jan 1, 2025
- Translational gastroenterology and hepatology
Gastrointestinal adenocarcinomas (GIACs) are common malignant tumors with poor prognosis in the world. Ferroptosis, characterized by the accumulation of intracellular iron and lipid reactive oxygen species, emerges as a pivotal process in tumorigenesis and cancer advancement. However, the implications of ferroptosis-related genes in GIAC remain to be elucidated. This study aimed at exploring the potential role of ferroptosis-related genes on the prognosis and treatment of GIAC. In our study, comprehensive clinical, transcriptomic, and/or genomic data were acquired from The Cancer Genome Atlas (TCGA), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), and Gene Expression Omnibus (GEO). We formulated a ferroptosis-score within the TCGA cohort through gene set variation analysis (GSVA) and subsequently validated in 4 GEO datasets (GSE84437, GSE17536, GSE103479, and GSE19417). Drug sensitivity and immunotherapy efficacy were analyzed in the GDSC dataset and the PRJEB25780 cohort, respectively. The ferroptosis-score was significantly associated with favorable overall survival both in the training cohort [TCGA: P=0.003; hazard ratio (HR), 0.67, 95% confidence interval (95% CI): 0.52-0.87] and across the four validation cohorts (GSE17536: P=0.03; HR, 0.57, 95% CI: 0.34-0.96; GSE19417: P=0.047; HR, 0.53, 95% CI: 0.28-1.01; GSE84437: P=0.004; HR, 0.68, 95% CI: 0.51-0.90; GSE103479: P=0.03; HR, 0.55, 95% CI: 0.32-0.96). Furthermore, the ferroptosis-score was correlated with activation of the DNA damage repair pathway and resistance to cisplatin. Notably, GIACs with low ferroptosis-scores exhibited heightened expression of immune checkpoint molecules such as programmed death-(ligand) 1 and cytotoxic T lymphocyte antigen-4, elevated densities of tumor-infiltrating CD8+ T cells, and a favorable response to pembrolizumab monotherapy. Our findings delineated the clinical relevance of ferroptosis-related genes in GIACs and demonstrated the potential utility of the ferroptosis-score in predicting prognosis and immunotherapy effectiveness.
- Research Article
36
- 10.1186/s12859-021-03974-3
- Jan 28, 2021
- BMC bioinformatics
BackgroundPredicting the response of cancer cell lines to specific drugs is an essential problem in personalized medicine. Since drug response is closely associated with genomic information in cancer cells, some large panels of several hundred human cancer cell lines are organized with genomic and pharmacogenomic data. Although several methods have been developed to predict the drug response, there are many challenges in achieving accurate predictions. This study proposes a novel feature selection-based method, named Auto-HMM-LMF, to predict cell line-drug associations accurately. Because of the vast dimensions of the feature space for predicting the drug response, Auto-HMM-LMF focuses on the feature selection issue for exploiting a subset of inputs with a significant contribution.ResultsThis research introduces a novel method for feature selection of mutation data based on signature assignments and hidden Markov models. Also, we use the autoencoder models for feature selection of gene expression and copy number variation data. After selecting features, the logistic matrix factorization model is applied to predict drug response values. Besides, by comparing to one of the most powerful feature selection methods, the ensemble feature selection method (EFS), we showed that the performance of the predictive model based on selected features introduced in this paper is much better for drug response prediction. Two datasets, the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are used to indicate the efficiency of the proposed method across unseen patient cell-line. Evaluation of the proposed model showed that Auto-HMM-LMF could improve the accuracy of the results of the state-of-the-art algorithms, and it can find useful features for the logistic matrix factorization method.ConclusionsWe depicted an application of Auto-HMM-LMF in exploring the new candidate drugs for head and neck cancer that showed the proposed method is useful in drug repositioning and personalized medicine. The source code of Auto-HMM-LMF method is available in https://github.com/emdadi/Auto-HMM-LMF.
- Research Article
17
- 10.1186/s12920-019-0519-2
- Jun 17, 2019
- BMC Medical Genomics
BackgroundThe availability and generation of large amounts of genomic data has led to the development of a new paradigm in cancer treatment emphasizing a precision approach at the molecular and genomic level. Statistical modeling techniques aimed at leveraging broad scale in vitro, in vivo, and clinical data for precision drug treatment has become an active area of research. As a rapidly developing discipline at the crossroads of medicine, computer science, and mathematics, techniques ranging from accepted to those on the cutting edge of artificial intelligence have been utilized. Given the diversity and complexity of these techniques a systematic understanding of fundamental modeling principles is essential to contextualize influential factors to better understand results and develop new approaches.MethodsUsing data available from the Genomics of Drug Sensitivity in Cancer (GDSC) and the NCI60 we explore principle components regression, linear and non-linear support vector regression, and artificial neural networks in combination with different implementations of correlation based feature selection (CBF) on the prediction of drug response for several cytotoxic chemotherapeutic agents.ResultsOur results indicate that the regression method and features used have marginal effects on Spearman correlation between the predicted and measured values as well as prediction error. Detailed analysis of these results reveal that the bulk relationship between tissue of origin and drug response is a major driving factor in model performance.ConclusionThese results display one of the challenges in building predictive models for drug response in pan-cancer models. Mainly, that bulk genotypic traits where the signal to noise ratio is high is the dominant behavior captured in these models. This suggests that improved techniques of feature selection that can discriminate individual cell response from histotype response will yield more successful pan-cancer models.
- Research Article
113
- 10.1016/j.omtn.2020.07.003
- Jul 10, 2020
- Molecular Therapy. Nucleic Acids
An Improved Anticancer Drug-Response Prediction Based on an Ensemble Method Integrating Matrix Completion and Ridge Regression