Abstract

Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.

Highlights

  • Drug sensitivity prediction constitutes one of the main challenges in personalized medicine

  • We employed each of the feature selection approaches, which can be divided into two categories: biologically driven and automatic, data-driven selection methods

  • We considered the union of the direct target genes and the drug’s target pathway genes

Read more

Summary

Introduction

Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. We compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. A multi-task learning approach based on a Bayesian model for collaborative filtering was proposed[23], which allows for identifying general interactions between features of the drugs with features of the cell lines. It gives insights in the form of ”activation of pathway Y will confer sensitivity to any drug targeting protein X”. Stability selection was proposed to mitigate this problem when regularized regression is applied[27], but it still comes without the guarantee to choose the most biologically relevant predictive features

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call